Understandable Statistics Solutions Manual Sosany2

March 17, 2018 | Author: hotboanga | Category: Level Of Measurement, Sampling (Statistics), Mode (Statistics), Mean, Median


Comments



Description

Part IV: Complete Solutions, Chapter 1 3Copyright © Houghton Mifflin Company. All rights reserved. Chapter 1: Getting Started Section 1.1 1. Individuals are people or objects included in the study, whereas a variable is a characteristic of the individual that is measured or observed. 2. Nominal data are always qualitative. 3. A parameter is a numerical measure that describes a population. A statistic is a numerical value that describes a sample. 4. If the population does not change, a parameter will not change. Thus, for a fixed population, parameter values are constant. Surely, if we take three samples of the same size from a population, the values of the sample statistics will differ. 5. (a) The variable is the response regarding frequency of eating at fast-food restaurants. (b) The variable is qualitative. The categories are the number of times one eats in fast-food restaurants. (c) The implied population is responses for all adults in the United States. 6. (a) The variable is miles per gallon. (b) The variable is quantitative because arithmetic operations can be applied to the mpg values. (c) The implied population is gasoline mileage for all new cars. 7. (a) The variable is the nitrogen concentration (milligrams of nitrogen/liter of water). (b) The variable is quantitative because arithmetic operations can be applied to nitrogen concentration. (c) The implied population is all the lakes in the wetlands. 8. (a) The variable is the number of ferromagnetic artifacts per 100 square meters. (b) The variable is quantitative because arithmetic operations can be applied to the number of artifacts. (c) The implied population is the number of ferromagnetic artifacts per each distinct 100-square-meter plot in the Tara region. 9. (a) Length of time to complete an exam is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and a time of 0 is the starting point for all measurements. (b) Time of first class is an interval level of measurement. The data may be arranged in order, and differences are meaningful. (c) Major field of study is a nominal level of measurement. The data consist of names only. (d) Course evaluation scale is an ordinal level of measurement. The data may be arranged in order. (e) Score on last exam is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and a score of 0 is the starting point for all measurements. (f) Age of student is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and an age of 0 is the starting point for all measurements. 4 Part IV: Complete Solutions, Chapter 1 Copyright © Houghton Mifflin Company. All rights reserved. 10. (a) Salesperson’s performance is an ordinal level of measurement. The data may be arranged in order. (b) Price of company’s stock is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and a price of 0 is the starting point for all measurements. (c) Names of new products is a nominal level of measurement. The data consist of names only. (d) Room temperature is an interval level of measurement. The data may be arranged in order, and differences are meaningful. (e) Gross income is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and an income of 0 is the starting point for all measurements. (f) Color of packaging is a nominal level of measurement. The data consist of names only. 11. (a) Species of fish is a nominal level of measurement. Data consist of names only. (b) Cost of rod and reel is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and a cost of 0 is the starting point for all measurements. (c) Time of return home is an interval level of measurement. The data may be arranged in order, and differences are meaningful. (d) Guidebook rating is an ordinal level of measurement. Data may be arranged in order. (e) Number of fish caught is a ratio level of measurement. The data may be arranged in order, differences and ratios are meaningful, and 0 fish caught is the starting point for all measurements. (f) Temperature of the water is an interval level of measurement. The data may be arranged in order, and differences are meaningful. 12. Form B would be better. Statistical methods can be applied to the ordinal data obtained from Form B but not to the answers obtained from Form A. 13. (a) Answers vary. Ideally, weigh the packs in pounds using a digital scale that has tenths of pounds for accuracy. (b) Some students may refuse to allow the weighing. (c) Informing students before class may cause students to remove items before class. Section 1.2 1. In stratified samples, we select a random sample from each stratum. In cluster sampling, we randomly select clusters to be included, and then each member of the selected cluster is sampled. 2. In simple random samples, every sample of size n has an equal chance of being selected. In a systematic sample, the only possible samples are those including every kth member of the population with respect to the random starting position. 3. Sampling error is the difference between the value of the population parameter and the value of the sample statistic that stems from the random selection process. Certainly, larger boxes of cereal will cost more than small boxes of cereal. 4. (a) Yes, your seating location and the randomized coin flip ensure equal chances of being selected. (b) Not using the described method of selection. This is not a simple random sample; it is a cluster sample. (c) Simply assign each student a number 1, 2, . . . , 40 and use a computer or a random-number table to select 20 students. Part IV: Complete Solutions, Chapter 1 5 Copyright © Houghton Mifflin Company. All rights reserved. 5. Simply use a computer or random-number table to randomly selected n students from the class after numbers are assigned. (a) Answers vary. Perhaps they are excellent students who make a special effort to get to class early. (b) Answers vary. Perhaps they are busy students who are never on time to class. (c) Answers vary. Perhaps students in the back row are introverted. (d) Answers vary. Perhaps tall students generally are healthier. 6. (a) Sick students and those who are skipping class cannot be sampled. (b) Home-schooled students, homeless students, and dropouts cannot be sampled. 7. Answers vary. 8. Answers vary. 9. Answers vary. 10. Answers vary. Perhaps use 0, 1, 2, 3, 4 to indicate H and 5, 6, 7, 8, 9 to indicate T. 11. (a) It is appropriate. Certainly we can roll a 1 more than once in 20 rolls. The fourth roll was 2. (b) No, simulated rolls of the die are random events, and we certainly would expect a different sequence. 12. Answers vary. We do expect at least once match on birthdays on over 50% of the times we run this experiment. 13. Answers vary. 14. Answers vary. 15. (a) This technique is simple random sampling. Every sample of size n from the population has an equal chance of being selected, and every member of the population has an equal chance of being included in the sample. (b) This technique is cluster sampling. The state, Hawaii, is divided into ZIP Codes. Then, within each of the 10 selected ZIP Codes, all businesses are surveyed. (c) This technique is convenience sampling. This technique uses results or data that are conveniently and readily obtained. (d) This technique is systematic sampling. Every fiftieth business is included in the sample. (e) This technique is stratified sampling. The population was divided into strata based on business type. Then a simple random sample was drawn from each stratum. 16. (a) This technique is stratified sampling. The population was divided into strata (four categories of length of hospital stay), and then a simple random sample was drawn from each stratum. (b) This technique is simple random sampling. (c) This technique is cluster sampling. There are five geographic regions, and some facilities from each region are selected randomly. Then, for each selected facility, all patients on the discharge list are surveyed to create the patient satisfaction profiles. (d) This technique is systematic sampling. Every 500th patient is included in the sample. (e) This technique is convenience sampling. This technique uses results or data that are conveniently and readily obtained. 6 Part IV: Complete Solutions, Chapter 1 Copyright © Houghton Mifflin Company. All rights reserved. Section 1.3 1. Answers vary. People with higher incomes likely will have high-speed Internet access, which will lead to spending more time on-line. Spending more time on-line might lead to spending less time watching TV. Thus, spending less time watching TV cannot be attributed solely to high income or high-speed Internet access. 2. A double-blind procedure would entail neither the patients nor those administering the treatments knowing which patients received which treatments. This process should eliminate potential bias from the treatment administrators and from patient psychology regarding benefits of the drug. 3. (a) This is an observational study because observations and measurements of individuals are conducted in a way that doesn’t influence the response variable being measured. (b) This is an experiment because a treatment is deliberately imposed on the bighorn sheep in order to observe a possible change in heartworm prevention. (c) This is an experiment because a treatment is deliberately imposed on the fishermen in order to observe a possible change in the length of fish in the river. (d) This is an observational study because observations of the turtles are conducted in a way that doesn’t change the response being measured. 4. (a) Sampling was used in the hospitals. (b) A computer simulation was used to mimic flight. (c) A census was used because all data were used by the NFL. (d) This was an experiment; patients were assigned a treatment, and the change in precancerous lesions was measured. 5. (a) Use random selection to pick 10 calves to inoculate. After inoculation, test all calves to see if there is a difference in resistance to infection between the two groups. No placebo is being used. (b) Use random selection to pick nine schools to visit. After the police visits, survey all the schools to see if there is a difference in views between the two groups. No placebo is being used. (c) Use random selection to pick 40 volunteers for the skin patch with the drug. Then record the smoking habits of all volunteers to see if a difference exists between the two groups. A placebo patch is used for the remaining 35 volunteers in the second group. 6. (a) “Over the last few years” could mean 2 years, 3 years, 7 years, etc. A more precise phrase is “Over the past 5 years.” (b) If a respondent is first asked, “Have you ever run a stop sign,” chances are that their response to the question, “Should fines be doubled,” will change. Those who run stop signs probably don’t want the fine to double. (c) When only yes or no are possible, most people likely will choose no. When, rarely, sometimes, and frequently are possible, most people likely will choose rarely or sometimes. 7. Based on the information, Scheme A will be better because the blocks are similar. The plots bordering the river should be similar, and the plots away from the river should be similar. Part IV: Complete Solutions, Chapter 1 7 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 1 Review 1. (a) Stratified (b) All undergraduates at the specific campus studied (c) The variable is number of hours worked. It is quantitative. It is a ratio. (d) The variable is career applicability. It is qualitative. It is ordinal. (e) It is a statistic. (f) The nonresponse rate is 60%, and it most likely will introduce bias into the study because those who do not answer may have different experiences than those who do answer. (g) Probably not. These results are most applicable only to the campus in the study. 2. The implied population is all the listeners (or even all the voters). The variable is the voting preference of a caller. There is probably bias in the selection of the sample because those with the strongest opinions are most likely to call in. 3. Using the random-number table, pick seven digits at random. Digits 0, 1, and 2 can correspond to “Yes,” and digits 3, 4, 5, 6, 7, 8, and 9 can correspond to “No.” This will effectively simulate a random draw from a population with 30% TIVO owners. 4. (a) Cluster (b) Convenience (c) Systematic (d) Simple random (e) Stratified 5. (a) This is an observational study because no treatments were applied. (b) This is an experiment because a treatment was applied (test type) and the results then were compared. 6. (a) Randomly select 500 donors to receive the literature and 500 donors to receive the phone call. After the donation collection period, compare the average amount or total amount collected from each of the two treatment groups. (b) Randomly select the 43 adults to be given the treatment gel and the 42 adults to receive the placebo gel. After the treatment period, compare the whiteness of the two groups. To make this double blind, neither the treatment administrators nor would the patients would know which gel the patients are receiving. 7. Questions should be worded in a clear, concise, and unbiased manner. No questions should be misleading. Commonsense rules should be stated for any numerical answers. 8. No response required. 9. (a) This is an experiment; the treatment was the amount of light given to the colonies. (b) We can assume that the normal-light group is the control group because this simulates normal light patterns for the fireflies. Therefore, the constant-light group is the treatment group. (c) Number of fireflies alive at the end of the study (d) Ratio 220 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 2: Organizing Data Section 2.1 1. Class limits are possible data values, and they specify the span of data values that fall within a class. Class boundaries are not possible data values; they are values halfway between the upper class limit of one class and the lower class limit of the next class. 2. Each data value must fall into one class. Data values above 50 do not have a class. 3. The classes overlap. A data value such as 20 falls into two classes. 4. These class widths are 11. 5. (a) Yes. (b) Highway mpg F r e q u e n c y 40.5 36.5 32.5 28.5 24.5 20.5 16.5 12 10 8 6 4 2 0 Hi st ogr am of Hi ghway mpg 6. (a) (c) Salar ies F r e q u e n c y 253.5 207.5 161.5 115.5 69.5 23.5 40 30 20 10 0 Hi st ogr am of Sal ar y Salar ies F r e q u e n c y 68.5 59.5 50.5 41.5 32.5 23.5 12 10 8 6 4 2 0 Hi st ogr am of Sal ar y (b) Yes. Yes. 7. (a) Class width = 25 (b) Class Limits Class Boundaries Midpoints Frequency Relative Frequency Cumulative Frequency 236–260 235.5–260.5 248 4 0.07 4 261–285 260.5–285.5 273 9 0.16 13 286–310 285.5–310.5 298 25 0.44 38 311–335 310.5–335.5 323 16 0.28 54 336–360 335.5–360.5 348 3 0.05 57 Part IV: Complete Solutions, Chapter 2 221 Copyright © Houghton Mifflin Company. All rights reserved. (c) (d) Finish Times F r e q u e n c y 360.0 335.5 310.5 285.5 260.5 236.0 25 20 15 10 5 0 Hi st ogr am of Fi ni sh Ti mes Finish Times R e la t iv e F r e q u e n c y 360.0 335.5 310.5 285.5 260.5 236.0 50 40 30 20 10 0 Hi st ogr am of Fi ni sh Ti mes (e) This distribution is slightly skewed to the left but fairly mound-shaped, symmetric. (f) 8. (a) Class width = 11 (b) Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 45–55 44.5–55.5 50 3 0.0429 3 56–66 55.5–66.5 61 7 0.8714 10 67–77 66.5–77.5 72 22 0.3143 32 78–88 77.5–88.5 83 26 0.3714 58 89–99 88.5–99.5 94 9 0.1286 67 100–110 99.5–110.5 105 3 0.0429 70 222 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. (c) (d) GLUCOSE F r e q u e n c y 110.5 99.5 88.5 77.5 66.5 55.5 44.5 25 20 15 10 5 0 Hi st ogr am of GLUCOSE GLUCOSE R e l a t i v e F r e q u e n c y 110.5 99.5 88.5 77.5 66.5 55.5 44.5 40 30 20 10 0 Hi st ogr am of GLUCOSE (e) Approximately mound-shaped, symmetric. (f) To create the ogive, place a dot on the x axis at the lower class boundary of the first class, and then, for each class, place a dot above the upper class boundary value at the height of the cumulative frequency for the class. Connect the dots with line segments. 9. (a) Class width = 12 (b) Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 1–12 0.5–12.5 6.5 6 0.14 6 13–24 12.5–24.5 18.5 10 0.24 16 25–36 24.5–36.5 30.5 5 0.12 21 37–48 36.5–48.5 42.5 13 0.31 34 49–60 48.5–60.5 54.5 8 0.19 42 (c) (d) Time Unt il Recur r ence F r e q u e n c y 60.5 48.5 36.5 24.5 12.5 0.5 14 12 10 8 6 4 2 0 Hi st ogr am of Ti me Unt i l Recur r ence Time Unt il Recur r ence R e l a t i v e F r e q u e n c y 60.5 48.5 36.5 24.5 12.5 0.5 35 30 25 20 15 10 5 0 Hi st ogr am of Ti me Unt i l Recur r ence (e) The distribution is bimodal. (f) To create the ogive, place a dot on the x axis at the lower class boundary of the first class, and then, for each class, place a dot above the upper class boundary value at the height of the cumulative frequency for the class. Connect the dots with line segments. Part IV: Complete Solutions, Chapter 2 223 Copyright © Houghton Mifflin Company. All rights reserved. 10. (a) Class width = 28. (b) Class Limits Class Boundaries Midpoint Frequency Relative Frequency 10–37 9.5–37.5 23.5 7 7 38–65 37.5–65.5 51.5 25 32 66–93 65.5–93.5 79.5 26 58 94–121 93.5–121.5 107.5 9 67 122–149 121.5–149.5 135.5 5 72 150–177 149.5–177.5 163.5 0 72 178–205 177.5–205.5 191.5 1 73 (c) (d) Dept h F r e q u e n c y 205.5 177.5 149.5 121.5 93.5 65.5 37.5 9.5 25 20 15 10 5 0 Hi st ogr am of Dept h Dept h R e l a t i v e F r e q u e n c y 205.5 177.5 149.5 121.5 93.5 65.5 37.5 9.5 40 30 20 10 0 Hi st ogr am of Dept h (e) This distribution is skewed right with a possible outlier. (f) To create the ogive, place a dot on the x axis at the lower class boundary of the first class, and then, for each class, place a dot above the upper class boundary value at the height of the cumulative frequency for the class. Connect the dots with line segments. 11. (a) Class width = 9 (b) Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 10–18 9.5–18.5 14 6 0.11 6 19–27 18.5–27.5 23 26 0.47 32 28–36 27.5–36.5 32 20 0.36 52 37–45 36.5–45.5 41 1 0.02 53 46–54 45.5–54.5 50 2 0.04 55 224 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. (c) (d) MPGAL F r e q u e n c y 54.5 45.5 36.5 27.5 18.5 9.5 25 20 15 10 5 0 Hi st ogr am of MPGAL MPGAL R e l a t i v e F r e q u e n c y 54.5 45.5 36.5 27.5 18.5 9.5 50 40 30 20 10 0 Hi st ogr am of MPGAL (e) This distribution is skewed right. (f) To create the ogive, place a dot on the x axis at the lower class boundary of the first class, and then, for each class, place a dot above the upper class boundary value at the height of the cumulative frequency for the class. Connect the dots with line segments. 12. (a) Class width = 6 (b) Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 0–5 0.5–5.5 2.5 13 0.24 13 6–11 5.5–11.5 8.5 15 0.27 28 12–17 11.5–17.5 14.5 11 0.20 39 18–23 17.5–23.5 20.5 3 0.05 42 24–29 23.5–29.5 26.5 6 0.11 48 30–35 29.5–35.5 32.5 4 0.07 52 36–41 35.5–41.5 38.5 2 0.04 54 42–47 41.5–47.5 44.5 1 0.02 55 (c) (d) Thr ee- Syllable Wor ds F r e q u e n c y 47.5 41.5 35.5 29.5 23.5 17.5 11.5 5.5 -0.5 16 14 12 10 8 6 4 2 0 Hi st ogr am of Thr ee-Syl l abl e Wor ds Thr ee- Syllable Wor ds R e l a t i v e F r e q u e n c y 47.5 41.5 35.5 29.5 23.5 17.5 11.5 5.5 -0.5 30 25 20 15 10 5 0 Hi st ogr am of Thr ee-Syl l abl e Wor ds (e) The distribution is skewed right. Part IV: Complete Solutions, Chapter 2 225 Copyright © Houghton Mifflin Company. All rights reserved. (f) To create the ogive, place a dot on the x axis at the lower class boundary of the first class, and then, for each class, place a dot above the upper class boundary value at the height of the cumulative frequency for the class. Connect the dots with line segments. 13. (b) Class Limits Class Boundaries Midpoint Frequency 46–85 45.5–85.5 65.5 4 86–125 85.5–125.5 105.5 5 126–165 125.5–165.5 145.5 10 166–205 165.5–205.5 185.5 5 206–245 205.5–245.5 225.5 5 246–285 245.5–285.5 265.5 3 F r e q u e n c y 2.855 2.455 2.055 1.655 1.255 0.855 0.455 10 8 6 4 2 0 Hi st ogr am of Tonnes 226 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. (c) 14. (b) F r e q u e n c y 0.3215 0.2785 0.2355 0.1925 0.1495 0.1065 10 8 6 4 2 0 Hi st ogr am of Aver age Class Limits Class Boundaries Midpoint Frequency 0.46–0.85 0.455–0.855 0. 655 4 0.86–1.25 0.855–1.255 1.055 5 1.26–1.65 1.255–1.655 1.455 10 1.66–2.05 1.655–2.055 1.855 5 2.06–2.45 2.055–2.455 2.255 5 2.46–2.85 2.455–2.855 2.655 3 Class Limits Class Boundaries Midpoint Frequency 107–149 106.5–149.5 128 3 150–192 149.5–192.5 171 4 193–235 192.5–235.5 214 3 236–278 235.5–278.5 257 10 279–321 278.5–321.5 300 6 Part IV: Complete Solutions, Chapter 2 227 Copyright © Houghton Mifflin Company. All rights reserved. (c) Class Limits Class Boundaries Midpoint Frequency 0.107–0.149 0.1065–0.1495 0.128 3 0.150–0.192 0.1495–0.1925 0.171 4 0.193–0.235 0.1925–0.2355 0.214 3 0.236–0.278 0.2355–0.2785 0.257 10 0.279–0.321 0.2785–0.3215 0.300 6 15. (a) 1 (b) About 5/51 = 0.098 = 9.8% (c) 650 to 750 16. Finish Times 360 342 324 306 288 270 252 234 Dot pl ot of Fi ni sh Ti mes The dotplot shows some of the characteristics of the histogram, such as more dot density from 280 to 340, for instance, that corresponds roughly to the histogram bars of heights 25 and 16. However, the dotplot and histogram are somewhat difficult to compare because the dotplot can be thought of as a histogram with one value, the class mark (i.e., the data value), per class. Because the definitions of the classes (and therefore the class widths) differ, it is difficult to compare the two figures. 228 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 17. Mont hs 56 48 40 32 24 16 8 0 Dot pl ot of Mont hs The dotplot shows some of the characteristics of the histogram, such as the concentration of most of the data in two peaks, one from 13 to 24 and another from 37 to 48. However, the dotplot and histogram are somewhat difficult to compare because the dotplot can be thought of as a histogram with one value, the class mark (i.e., the data value), per class. Because the definitions of the classes (and therefore the class widths) differ, it is difficult to compare the two figures. Section 2.2 1. A Pareto chart because it shows the five conditions in their order of importance to employees. 2. A time-series graph because the pattern of stock prices over time is more relevant than just the frequency of a specific range of closing prices. 3. I n c o m e D o c t o r a l D e g r e e M a s t e r D e g r e e B a c h e l o r D e g r e e A s s o c i a t e D e g r e e H i g h S c h o o l G r a d u a t e 9 t h G r a d e 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 Bar Gr aph f or I ncome vs Educat i on Part IV: Complete Solutions, Chapter 2 229 Copyright © Houghton Mifflin Company. All rights reserved. 4. D e a t h s ( 1 0 0 0 s ) S w e d e n U . K . It a ly N e t h e r la n d s S p a i n D e n m a r k I c e la n d G e r m a n y J a p a n F r a n c e A u s t r a li a S w it z e r la n d C a n a d a H u n g a r y P o la n d N e w Z e a l a n d U n it e d S t a t e s P o r t u g a l M e x ic o K o r e a 25 20 15 10 5 0 Par et o Char t of Deat hs ( 1000s) vs Count r y 5. M e t r i c T o n s Sablefish Rockfish Flatfish Pacific Cod Walleye Pollock 80 70 60 50 40 30 20 10 0 Par et o Char t of Met r i c Tons vs Fi sh Speci es 6. (a) N u m b e r o f S p e a r h e a d s Blackwater Barrow Erne Bann Shannon 35 30 25 20 15 10 5 0 Par et o Char t of Number of Spear heads vs Ri ver 230 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. (b) 15.7% Barrow 37.1% Shannon 16.9% Erne 9.0% Blackwat er 21.3% Bann Pi e Char t of Number of Spear heads 7. 23.0% Under Bed 6.0% Bat ht ub 3.0% Freezer 68.0% Closet Pi e Char t of Hi di ng Pl aces 8. 6.0% Consult ing 11.0% College Service 11.0% Communit y Service 5.0% Professional Growt h 16.0% Research 51.0% Teaching Pi e Char t of Pr of essor Act i vi t i es Part IV: Complete Solutions, Chapter 2 231 Copyright © Houghton Mifflin Company. All rights reserved. 9. (a) C r i m e R a t e P e r 1 0 0 , 0 0 0 M u r d e r R a p e R o b b e r y A s s a u l t M o t o r V e h i c l e T h e f t H o u s e B u r g l a r y 900 800 700 600 500 400 300 200 100 0 Par et o Char t of Cr i me Rat e vs. Cr i me Type (b) Yes, but the graph would take into account only these particular crimes and would not indicate if multiple crimes occurred during the same incident. 10. (a) P e r c e n t C o m p l a i n t O t h e r s I n c o n s i d e r a t e O t h e r s D r i v e T o o S l o w B e i n g C u t O f f N o T u r n S i g n a l T a i l g a t i n g 25 20 15 10 5 0 Par et o Char t of Compl ai nt s (b) Since the percentages do not add to 100%, a circle graph cannot be used. If we create an “other” category and assume that all other respondents fit this category, then a circle could be created. 11. 232 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. Year E l e v a t i o n 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 3820 3815 3810 3805 3800 3795 Ti me Ser i es Pl ot of El evat i on 12. Age H e i g h t 14.0 13.0 12.0 11.0 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.5 65 60 55 50 45 40 35 30 25 Ti me Ser i es Pl ot of Hei ght Part IV: Complete Solutions, Chapter 2 233 Copyright © Houghton Mifflin Company. All rights reserved. Section 2.3 1. (a) The smallest value is 47 and the largest value is 97, so we need stems 4, 5, 6, 7, 8, and 9. Use the tens digit as the stem and the ones digit as the leaf. Longevity of Cowboys 4 7 = 47 years 4 7 5 2 7 8 8 6 1 6 6 8 8 7 0 2 2 3 3 5 6 7 8 4 4 4 5 6 6 7 9 9 0 1 1 2 3 7 (b) Yes, these cowboys certainly lived long lives, as evidenced by the high frequency of leaves for stems 7, 8, and 9 (i.e., 70-, 80-, and 90-year-olds). 2. The largest value is 91 (percent of wetlands lost) and the smallest value is 9 (percent), which is coded as 09. We need stems 0 to 9. Use the tens digit as the stem and the ones digit as the leaf. The percentages are concentrated from 20% to 50%. These data are fairly symmetric, perhaps slightly skewed right. There is a gap showing that none of the lower 48 states has lost from 10% to 19% of its wetlands. Percent of Wetlands Lost 4 0 = 40% 0 9 1 2 0 3 4 7 7 8 3 0 1 3 5 5 5 6 7 8 8 9 4 2 2 6 6 6 8 9 9 5 0 0 0 2 2 4 6 6 9 9 6 0 7 7 2 3 4 8 1 5 7 7 9 9 0 1 3. The longest average length of stay is 11.1 days in North Dakota, and the shortest is 5.2 days in Utah. We need stems from 5 to 11. Use the digit(s) to the left of the decimal point as the stem and the digit to the right as the leaf. Average Length of Hospital Stay 5 2 = 5.2 days 5 2 3 5 5 6 7 6 0 2 4 6 6 7 7 8 8 8 8 9 9 7 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 3 4 4 5 5 6 6 8 8 4 5 7 9 4 6 9 10 0 3 11 1 The distribution is skewed right. 234 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 4. Number of Hospitals per State 0 8 = 8 hospitals 0 8 15 1 1 2 5 6 9 16 2 2 1 7 7 17 5 3 5 7 8 18 4 1 2 7 19 3 5 1 2 3 9 20 9 6 1 6 8 21 7 1 22 7 8 8 23 1 6 9 0 2 6 8 10 1 2 7 42 1 11 3 3 7 9 43 12 2 3 9 44 0 13 3 3 6 14 8 Texas and California have the highest number of hospitals, 421 and 440, respectively. Both states have large populations and large areas. The four largest states by area are Alaska, Texas, California, and Montana. 5. (a) The longest time during 1961–1980 is 23 minutes (i.e., 2:23), and the shortest time is 9 minutes (2:09). We need stems 0, 1, and 2. We’ll use the tens digit as the stem and the ones digit as the leaf, placing leaves 0, 1, 2, 3, and 4 on the first stem and leaves 5, 6, 7, 8, and 9 on the second stem. Minutes Beyond 2 Hours (1961–1980) 0 9 = 9 minutes past 2 hours 0 9 9 1 0 0 2 3 3 1 5 5 6 6 7 8 8 9 2 0 2 3 3 (b) The longest time during the period 1981–2000 was 14 (2:14) and the shortest was 7 (2:07), so we’ll need stems 0 and 1 only. Minutes Beyond 2 Hours (1981–2000) 0 7 = 7 minutes past 2 hours 0 7 7 7 8 8 8 8 9 9 9 9 9 9 9 9 1 0 0 1 1 4 (c) There were seven times under 2:15 during 1961–1980, and there were 20 times under 2:15 during 1981–2000. 6. (a) The largest (worst) score in the first round was 75; the smallest (best) score was 65. We need stems 6 and 7. Leaves 0–4 go on the first stem, and leaves 5−9 belong on the second stem. Part IV: Complete Solutions, Chapter 2 235 Copyright © Houghton Mifflin Company. All rights reserved. First-Round Scores 6 8 = score of 68 6 5 6 7 7 7 0 1 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 3 4 4 4 7 5 5 5 5 5 5 5 (b) The largest score in the fourth round was 74, and the smallest was 68. Here we need stems 6 and 7. Fourth-Round Scores 6 8 = score of 68 6 8 9 9 9 9 9 7 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 (c) Scores are lower in the fourth round. In the first round, both the low and high scores were more extreme than in the fourth round. 7. The largest value in the data is 29.8 mg of tar per cigarette smoked, and the smallest value is 1.0. We will need stems from 1 to 29, and we will use the numbers to the right of the decimal point as the leaves. Milligrams of Tar per Cigarette 1 0 = 1.0 mg tar 1 0 2 3 4 1 5 5 6 7 3 8 8 0 6 8 9 0 10 11 4 12 0 4 8 13 7 14 1 5 9 15 0 1 2 8 16 0 6 17 0 . . . 29 8 236 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 8. The largest value in the data set is 23.5 mg of carbon monoxide per cigarette smoked, and the smallest is 1.5. We need stems from 1 to 23, and we’ll use the numbers to the right of the decimal point as leaves. Milligrams of Carbon Monoxide 1 5 = 1.5 mg CO 1 5 2 3 4 9 5 4 6 7 8 5 9 0 5 10 0 2 2 6 11 12 3 6 13 0 6 9 14 4 9 15 0 4 9 16 3 6 17 5 18 5 . . . 23 5 9. The largest value in the data set is 2.03 mg of nicotine per cigarette smoked. The smallest value is 0.13. We will need stems 0, 1, and 2. We will use the number to the left of the decimal point as the stem and the first number to the right of the decimal point as the leaf. The number 2 placed to the right of the decimal point (the hundredths digit) will be truncated (not rounded). Milligrams of Nicotine per Cigarette 0 1 = 0.1 milligram 0 1 4 4 0 5 6 6 6 7 7 7 8 8 9 9 9 1 0 0 0 0 0 0 0 1 2 1 2 0 10. (a) For Site I, the least depth is 25 cm, and the greatest depth is 110 cm. For Site II, the least depth is 20 cm, and the greatest depth is 125 cm. (b) The Site I depth distribution is fairly symmetric, centered near 70 cm. Site II is fairly uniform in shape except that there is a huge gap with no artifacts from 70 to 100 cm. (c) It would appear that Site II probably was unoccupied during the time period associated with 70 to 100 cm. Part IV: Complete Solutions, Chapter 2 237 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 2 Review 1. (a) Bar graphs, Pareto charts, pie charts (b) All, but quantitative data must be categorized to use a bar graph, Pareto chart, or pie chart. 2. A time-series graph because a change over time is most relevant 3. Any large gaps between bars or stems might indicate potential outliers. 4. Dotplots and stem-and-leaf displays both show every data value. Stem-and-leaf plots group the data with the same stem, whereas dotplots only group the data with identical values. 5. (a) Figure 2-1(a) (in the text) is essentially a bar graph with a “horizontal” axis showing years and a “vertical” axis showing miles per gallon. However, in depicting the data as a highway and showing them in perspective, the ability to correctly compare bar heights visually has been lost. For example, determining what would appear to be the bar heights by measuring from the white line on the road to the edge of the road along a line drawn from the year to its mpg value, we get the bar height for 1983 to be approximately ⅞ inch and the bar height for 1985 to be approximately 1⅜ inches (i.e., 11/8 inches). Taking the ratio of the given bar heights, we see that the bar for 1985 should be 27.5 26 1.06 ≈ times the length of the 1983 bar. However, the measurements show a ratio of 11 8 7 8 11 7 1.60; ≈ = i.e., the 1985 bar is (visually) 1.6 times the length of the 1983 bar. Also, the years are evenly spaced numerically, but the figure shows the more recent years to be more widely spaced owing to the use of perspective. (b) Figure 2-1(b) is a time-series graph showing the years on the x axis and miles per gallon on the y axis. Everything is to scale and not distorted visually by the use of perspective. It is easy to see the mpg standards for each year, and you also can see how fuel economy standards for new cars have changed over the 8 years shown (i.e., a steep increase in the early years and a leveling off in the later years). 6. (a) We estimate the 1980 prison population at approximately 140 prisoners per 100,000 and the 1997 population at approximately 440 prisoners per 100,000 people. (b) The number of inmates per 100,000 increased every year. (c) The population 266,574,000 is 2,665.74 × 100,000, and 444 per 100,000 is 444 . 100, 000 So 444 (2,665.74×100,000) 1,183,589 100, 000 × ≈ prisoners. The projected 2020 population is 323,724,000, or 3,237.24 × 100,000. So 444 (3,237.24 100,000) 1,437,335 100, 000 × × ≈ prisoners. 238 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 7. Owing to rounding, the percentages are slightly different from those in the text. 10.1% Unknown 8.1% Calculat ion 10.1% Correct Form 28.3% Deduct ions 43.4% I RS Jargon Pi e Char t of Tax Ret ur n Di f f i cul t i es 8. (a) Since the ages are two-digit numbers, use the ten’s digit as the stem and the one’s digit as the leaf. Age of DUI Arrests 1 6 = 16 years 1 6 8 2 0 1 1 2 2 2 3 4 4 5 6 6 6 7 7 7 9 3 0 0 1 1 2 3 4 4 5 5 6 7 8 9 4 0 0 1 3 5 6 7 7 9 9 5 1 3 5 6 8 6 3 4 (b) The largest age is 64 and the smallest is 16, so the class width for seven classes is 64 16 6.86; 7 − ≈ use 7. The lower class limit for the first class is 16; the lower class limit for the second class is 16 + 7 = 23. The total number of data points is 50, so calculate the relative frequency by dividing the class frequency by 50. Part IV: Complete Solutions, Chapter 2 239 Copyright © Houghton Mifflin Company. All rights reserved. Age Distribution of DUI Arrests Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 16–22 15.5–22.5 19 8 0.16 8 23–29 22.5–29.5 26 11 0.22 19 30–36 29.5–36.5 33 11 0.22 30 37–43 36.5–43.5 40 7 0.14 37 44–50 43.5–50.5 47 6 0.12 43 51–57 50.5–57.5 54 4 0.08 47 58–64 57.5–64.5 61 3 0.06 50 The class boundaries are the average of the upper class limit of the next class. The midpoint is the average of the class limits for that class. (c) Age F r e q u e n c y 64.5 57.5 50.5 43.5 36.5 29.5 22.5 15.5 12 10 8 6 4 2 0 Hi st ogr am of Age (d) This distribution is skewed right. 240 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 9. (a) The largest value is 142 mm, and the smallest value is 69. For seven classes, we need a class width of 142 69 10.4; 7 − ≈ use 11. The lower class limit of the first class is 69, and the lower class limit of the second class is 69 + 11 = 80. The class boundaries are the average of the upper class limit of one class and the lower class limit of the next higher class. The midpoint is the average of the class limits for that class. There are 60 data values total, so the relative frequency is the class frequency divided by 60. Class Limits Class Boundaries Midpoint Frequency Relative Frequency Cumulative Frequency 69–79 68.5–79.5 74 2 0.03 2 80–90 79.5–90.5 85 3 0.05 5 91–101 90.5–101.5 96 8 0.13 13 102–112 101.5–112.5 107 19 0.32 32 113–123 112.5–123.5 118 22 0.37 54 124–134 123.5–134.5 129 3 0.05 57 135–145 134.5–145.5 140 3 0.05 60 (b) Cir cumf er ence F r e q u e n c y 145.5 134.5 123.5 112.5 101.5 90.5 79.5 68.5 25 20 15 10 5 0 Hi st ogr am of Ci r cumf er ence Part IV: Complete Solutions, Chapter 2 241 Copyright © Houghton Mifflin Company. All rights reserved. (c) Cir cumf er ence R e l a t i v e F r e q u e n c y 145.5 134.5 123.5 112.5 101.5 90.5 79.5 68.5 40 30 20 10 0 Hi st ogr am of Ci r cumf er ence (d) This distribution is skewed left. (e) The ogive begins on the x axis at the lower class boundary and connects dots placed at (x, y) coordinates (upper class boundary, cumulative frequency). 10. (a) General torts occur most frequently. F i l i n g s ( 1 0 0 0 s ) All Other Other Product Asbestos Contracts Torts 200 150 100 50 0 Par et o Char t of Fi l i ngs vs Type (b) 5.2% All Ot her 9.4% Ot her Product 12.1% Asbest os 47.0% Tort s 26.4% Cont ract s Pi e Char t of Fi l i ngs 242 Part IV: Complete Solutions, Chapter 2 Copyright © Houghton Mifflin Company. All rights reserved. 11. (a) To determine the decade that contained the most samples, count both rows (if shown) of leaves; recall that leaves 0–4 belong on the first line and leaves 5–9 belong on the second line when two lines per stem are used. The greatest number of leaves is found on stem 124, i.e., the 1240s (the 40s decade in the 1200s), with 40 samples. (b) The number of samples with tree-ring dates 1200 to 1239 A.D. is 28 + 3 + 19 + 25 = 75. (c) The dates of the longest interval with no sample values are 1204 through 1211 A.D. This might mean that for these 8 years, the pueblo was unoccupied (thus no new or repaired structures), or that the population remained stable (no new structures needed), or that, say, weather conditions were favorable those years, so existing structures didn’t need repair. If relatively few new structures were built or repaired during this period, their tree rings might have been missed during sample selection. Part IV: Complete Solutions, Chapter 3 261 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 3: Averages and Variation Calculations may vary slightly owing to rounding. Section 3.1 1. The middle value is the median. The most frequent value is the mode. The mean takes all values into account. 2. The symbol for the sample mean is , x and the symbol for the population mean is μ. 3. For a mound-shaped, symmetric distribution, the mean, median, and mode all will be equal. 4. (a) Mean, median, and mode if it exists (b) Mode if it exists (c) Mean, median, and mode if it exists 5. (a) Mode = 5, the most common value Median = 4, the middle value in the ordered data set Mean = 2 3 4 5 5 19 3.8 5 5 + + + + = = (b) Only the mode (c) All three make sense. (d) The mode and the median 6. (a) Mode = 2, the most common value Median = 3, the middle value in the ordered data set Mean = 2 2 3 6 10 4.6 5 + + + + = (b) Mode = 7, median = 8, mean = 9.6, using the same techniques as part (a) (c) Each statistic was increased by 5. In general, adding a constant c to each value in a data set results in the mode, median, and mean increasing by c. 7. (a) Mode = 2, the most common value Median = 3, the middle value in the ordered data set Mean = 2 2 3 6 10 4.6 5 + + + + = (b) Mode = 10, median = 15, mean = 23, using the same techniques as part (a) (c) Each statistic was multiplied by 5. In general, multiplying each value in a data set by a constant c results in the mode, median, and mean being multiplied by c. (d) Mode = 177.8 cm, median = 172.72 cm, mean = 180.34 cm 8. (a) If the largest data value is replaced by a larger value, the mean will increase because the sum of the data values will increase. The median will not change because the same value will still be in the eighth position when the data are ordered. (b) If the largest value is replaced by a smaller value (but still higher than the median), the mean will decrease because the sum of the data values will decrease. The median will not change because the same value will be in the eighth position in increasing order. 262 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. (c) If the largest value is replaced by a value that is smaller than the median, the mean will decrease because the sum of the data values will decrease. The median also will decrease because the former value in the eighth position will move to the ninth position in increasing order. The median will be the new value in the eighth position. 9. Mean = 146 152 144 167.3 14 + + + ≈  To compute the median, first order the data set smallest to largest. Then Median = 168 174 171 2 + = Mode = most common value = 178 10. 111 Mean 6.167 18 5 7 Median 6 2 Mode 7 = ≈ + = = = 11. First, organize the data from smallest to largest. Then compute the mean, median, and mode. (a) Upper Canyon 1 1 1 2 3 3 3 3 4 6 9 36 Mean 3.27 11 Median 3 (middle value) Mode 3 (occurs most frequently) x x n Σ = = = ≈ = = (b) Lower Canyon 0 0 1 1 1 1 2 2 3 6 7 8 13 14 59 Mean 4.21 14 2 2 Median 2 2 Mode 1 (occurs most frequently) x x n Σ = = = ≈ + = = = (c) The mean for the Lower Canyon is greater than that of the Upper Canyon. However, the median and mode for the Lower Canyon are less than those of the Upper Canyon. (d) 5% of 14 is 0.7, which rounds to 1. So eliminate one data value from the bottom of the list and one from the top. Then compute the mean of the remaining 12 values. 45 5% trimmed mean 3.75 12 x n Σ = = = Now this value is closer to the Upper Canyon mean. Part IV: Complete Solutions, Chapter 3 263 Copyright © Houghton Mifflin Company. All rights reserved. 12. (a) 1050 Mean 26.3 40 x x n Σ = = = ≈ years 25 26 Median 25.5 2 + = = years Mode 25 = (b) The three averages are close, so each represents the age fairly accurately. There may be one high outlier (37), so the median may be the best measure. 13. (a) 2723 Mean $136.20 20 65 68 Median $66.50 2 Mode $60.00 x = = ≈ + = = = (b) 5% of 20 data values is 1, so we remove the smallest and largest values and recompute the mean. 2183 Mean $121.30 18 x = = ≈ . The trimmed mean is still much larger than the median. (c) Reporting the median certainly will give the customer a much lower figure for the daily cost, but that really doesn’t tell the whole story. Reporting the mean and the median, as well as the high outliers, may be the most useful description of the situation. 14. ( ) ( ) ( ) ( ) Weighted average = 92 0.25 81 0.225 93 0.225 85 0.30 1 87.65 xw w ∑ ∑ + + + = = 15. ( ) ( ) ( ) ( ) Weighted average = 9 2 7 3 6 1 10 4 2 3 1 4 85 10 8.5 xw w ∑ ∑ + + + = + + + = = 16. (a) Weighted average 64.1(0.38) 75.8(0.47) 23.9(0.07) 68.2(0.08) 1 67.1 mg/l xw w Σ = Σ + + + = ≈ (b) Since 67.1 mg/L is greater than 58 mg/L, this wetlands system does not meet the target standard for the chlorine compound. The average chlorine compound mg/L is too high. 264 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. 17. 2 Harmonic mean 66.67 1 1 60 75 = ≈ + mph 18. 5 Geometric mean 1.10 1.12 1.148 1.038 1.16 1.112 = × × × × = . Thus the average growth factor is approximately 11%. Section 3.2 1. The mean is associated with the standard deviation. 2. The standard deviation is the square root of the variance. 3. Yes. When computing the sample standard deviation, divide by n – 1. When computing the population standard deviation, divide by n. 4. The symbol for the sample standard deviation is S. The symbol for the population standard deviation is σ. 5. (a) i, ii, iii (b) The data change between data sets (i) and (ii) increased by the squared difference sum 2 ( ) x x − ∑ by 10, whereas the data change between data sets (ii) and (iii) increased the squared difference sum 2 ( ) x x − ∑ by only 6. 6. (a) ( ) 2 3.61 1 x x s n − = ≈ − ∑ (b) Adding a constant to each data value does not change s. Thus s ≈ 3.61. (c) Shifting data by c units does not change the standard deviation. 7. (a) s ≈ 3.61 (same as above) (b) s ≈ 18.0 (c) We see that the standard deviation has increased by 5. In general, multiplying each data value by a constant c will result in the standard deviation being multiplied by the absolute value of c. 8. (a) No, 80 is only 2 standard deviations away from its mean. (b) Yes, 80 is 3.33 standard deviations away from its mean. 9. (a) Range = maximum – minimum = 30 – 15 = 15 (b) Use a calculator to verify that 110 x Σ = and that 2 2, 568. x Σ = (c) Computation formula (sample data) for 2 . s Part IV: Complete Solutions, Chapter 3 265 Copyright © Houghton Mifflin Company. All rights reserved. 2 2 ( ) 2 (110) 5 1 2568 5 1 6.08 x n x s n Σ Σ − = − − = − ≈ 2 2 6.08 37 s = ≈ (d) 110 22 5 x x n Σ = = = Defining formula (sample data) for 2 . s 2 2 2 2 ( ) 1 (23 22) (17 22) (25 22) 5 1 6.08 x x s n Σ − = − − + − + + − = − ≈  2 2 6.08 37 s = ≈ (e) 22 µ = 2 2 2 2 ( ) (23 22) (17 22) (25 22) 5 5.44 x N µ σ Σ − = − + − + + − = ≈  2 2 5.44 29.59 σ = ≈ 266 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. 10. (a) X 2 x y 2 y 11 121 10 100 0 0 −2 4 36 1296 29 841 21 441 14 196 31 961 22 484 23 529 18 324 24 576 14 196 −11 121 −2 4 −11 121 −3 9 −21 441 −10 100 103 x Σ = 2 4607 x Σ = 90 y Σ = 2 2258 y Σ = (b) 103 10.3 10 x x n Σ = = = 90 9 10 y y n Σ = = = 2 2 ( ) 2 (103) 10 1 4607 10 1 19.85 x n x s n Σ Σ − = − − = − ≈ 2 2 ( ) 2 (90) 10 1 2258 10 1 12.68 y n y s n Σ Σ − = − − = − ≈ 2 2 19.85 394.0 s = ≈ 2 2 12.68 160.8 s = ≈ (c) x − 2s = 10.3 − 2(19.85) = −29.4 x + 2s = 10.3 + 2(19.85) = 50 y = 2s = 9 − 2(12.68) = −16.36 y + 2s = 9 + 2(12.68) = 34.36 At least 75% of the returns for the stock Total Stock Fund fall between –29.4% and 50%, whereas at least 75% of the returns for the Balanced Index fall between –16.36% and 34.36. Part IV: Complete Solutions, Chapter 3 267 Copyright © Houghton Mifflin Company. All rights reserved. (d) Stock fund: CV = 19.85 100 100 192.7% 10.3 s x ⋅ = ⋅ ≈ Balanced fund: 12.68 100 100 140.9% 9 s CV y = ⋅ = ⋅ ≈ For each unit of return, the balanced fund has lower risk. Since the CV can be thought of as a measure of risk per unit of expected return, a smaller CV is better because a lower risk is better. 11. (a) Range = 7.89 – 0.02 = 7.87 (b) Use a calculator to verify that 62.11 x Σ = and 2 164.23. x Σ = (c) 62.11 1.24 50 x x n Σ = = ≈ 2 2 ( ) 2 (62.11) 50 1 164.23 50 1 1.333 1.33 x n x s n Σ Σ − = − − = − ≈ ≈ 2 2 1.333 1.78 s = ≈ (d) 1.33 100 100 107% 1.24 s CV x = ⋅ = ⋅ ≈ The standard deviation of the time to failure is just slightly larger than the average time. 12. (a) x 2 x y 2 y 13.20 174.24 11.85 140.42 5.60 31.36 15.25 232.56 19.80 392.04 21.30 453.69 15.05 226.50 17.30 299.29 21.40 457.96 27.50 756.25 17.25 297.56 10.35 107.12 27.45 753.50 14.90 222.01 16.95 287.30 48.70 2371.69 23.90 571.21 25.40 645.16 268 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. 32.40 1049.76 25.95 673.40 40.75 1660.56 57.60 3317.76 5.10 26.01 34.35 1179.92 17.75 315.06 38.80 1505.44 28.35 803.72 41.00 1681.00 31.25 976.56 284.95 x Σ = 2 7046.80 x Σ = 421.5 y Σ = 2 14, 562.27 y Σ = (b) Grid E: 284.95 20.35 14 x x n Σ = = = 2 2 ( ) 2 2 (284.95) 14 1 7046.80 14 1 96 x n x s n Σ Σ − = − − = − ≈ 2 96 9.80 s s = = ≈ Grid H: 421.5 28.1 15 y y n Σ = = = 2 2 ( ) 2 2 (421.5) 15 1 14, 562.27 15 1 194 y n y s n Σ Σ − = − − = − ≈ 2 194 13.93 s s = = ≈ (c) 2 20.35 2(9.80) 0.75 x s − = − = 2 20.35 2(9.80) 39.95 x s + = + = For Grid E, at least 75% of the data fall in the interval 0.75–39.95. 2 28.1 2(13.93) 0.24 y s − = − = 2 28.1 2(13.93) 55.96 y s + = + = For Grid H, at least 75% of the data fall in the interval 0.24–39.95. Grid H shows a wider 75% range of values. Part IV: Complete Solutions, Chapter 3 269 Copyright © Houghton Mifflin Company. All rights reserved. (d) Grid E: 9.80 100 100 48% 20.35 s CV x = ⋅ = ⋅ ≈ Grid H: 13.93 100 100 49% 28.1 s CV y = ⋅ = ⋅ ≈ Grid H demonstrates slightly greater variability per expected signal. The CV, together with the confidence interval, indicates that Grid H might have more buried artifacts. 13. (a) Students verify results with a calculator. (b) 245 49 5 x x n Σ = = = 2 2 ( ) 2 (245) 5 1 14, 755 5 1 26.22 x n x s n Σ Σ − = − − = − ≈ 2 2 26.22 687.49 s = ≈ (c) 224 44.8 5 y y n Σ = = = 2 2 ( ) 2 (224) 5 1 12, 070 5 1 22.55 y n y s n Σ Σ − = − − = − ≈ 2 2 22.55 508.50 s = ≈ (d) Mallard nest: 26.22 100 100 53.5% 49 s CV x = ⋅ = ⋅ ≈ Canada Goose nest: 22.55 100 100 50.3% 44.8 s CV y = ⋅ = ⋅ ≈ The CV gives the ratio of the standard deviation to the mean. With respect to their means, the variation for the mallards is slightly higher than the variation for the Canada geese. 14. (a) 14.05 Pax 100 100 146.7% 9.58 12.50 Vanguard 100 100 138.6% 9.02 s CV x s CV x = ⋅ = ⋅ ≈ = ⋅ = ⋅ ≈ Vanguard fund has slightly less risk per unit of return. (b) Pax: 2 9.58 2(14.05) 18.52 2 9.58 2(14.05) 37.68 x s x s − = − = − + = + = At least 75% of returns for Pax fall within the interval −18.52% to 37.68%. 270 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. Vanguard: 2 9.02 2(12.50) 15.98 2 9.02 2(12.50) 34.02 x s x s − = − = − + = + = At least 75% of the returns for Vanguard fall within in the interval −15.98% to 34.02%. Vanguard has a narrower range of returns, with less downside, but also less upside. 15. 100 s CV x = ⋅ 100 x CV s = ⋅ ( ) 100 2.2 1.5 100 0.033 x CV s s s = = = ⋅ 16. Class f x xf x x − ( ) 2 x x − ( ) 2 x x f − 1–10 34 5.5 187 −10.6 112.36 3820.24 11–20 18 15.5 279 −0.6 0.36 6.48 21–30 17 25.5 433.5 9.4 88.36 1502.12 31 and over 11 35.5 390.5 19.4 376.36 4139.96 80 n f = ∑ = 1290 xf ∑ = ( ) 2 9468.8 x x f ∑ − = ( ) 2 2 1290 16.1 80 9468.8 119.9 1 79 119.9 10.95 xf x n x x f s n s ∑ = = ≈ ∑ − = = ≈ − = ≈ 1 7 Class f x xf x − ( ) 2 x x − ( ) 2 x x f − 21–30 260 6630 − 106.09 27,583.4 31–40 348 3 5 12,354 − 0.09 31.3 41 and 287 4 5 13,058.5 9 7 94.09 27,003.8 8 n f = ∑ = 32,042 xf ∑ = ( ) 2 54, x x f ∑ − = ( ) 2 2 32,042.5 35.80 895 54,619 61.1 1 894 61.1 7.82 xf x n x x s n s f ∑ = = ≈ ∑ − = = ≈ − = ≈ ⋅ Part IV: Complete Solutions, Chapter 3 271 Copyright © Houghton Mifflin Company. All rights reserved. 18. x f xf 2 x f 3.5 2 7 24.5 4.5 2 9 40.5 5.5 4 22 121.0 6.5 22 143 929.5 7.5 64 480 3,600.0 8.5 90 765 6,502.5 9.5 14 133 1,263.5 10.5 2 21 220.5 200 f ∑ = 1, 580 xf ∑ = 2 12,702 x f ∑ = ( ) ( ) 2 2 2 1,580 7.9 200 1, 580 12,702 220 200 220 1.05 1 199 1.05 100 100 13.29% 7.9 x x xf x n xf SS x f n SS s n s CV x ∑ = = = ∑ = ∑ − = − = = = ≈ − = = ≈ × × 19. Class f x xf x x − ( ) 2 x x − ( ) 2 x x f − 8.6–12.5 15 10.55 158.25 −5.05 25.502 382.537 12.6–16.5 20 14.55 291.00 −1.05 1.102 22.050 16.6–20.5 5 18.55 92.75 2.95 8.703 43.513 20.6–24.5 7 22.55 157.85 6.95 48.303 338.118 24.6–28.5 3 26.55 79.65 10.95 119.903 359.708 50 n f = ∑ = 779.5 xf ∑ = ( ) 2 1,145.9 x x f ∑ − = ( ) 2 2 779.5 15.6 50 1,145.9 23.4 1 49 23.4 4.8 xf x n x x f s n s ∑ = = ≈ ∑ − = = ≈ − = ≈ 20. (a) Students can use a TI-83 to verify the calculations. (b) For 1992, 1.78 17.79 7.46 9.01 3 x + + = = For 2000, 17.49 6.80 2.38 7.30 3 x + − = = (c) Students can use a TI-83 to verify the calculations. (d) The 3-year moving averages have approximately the same mean as computed in part (a), but the standard deviation is much smaller. 272 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. 21. ( ) ( ) ( ) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 x x x xx x x xx x x x x nx x xnx nx x x - nx nx x nx x n n x x n − = − + = − + = − + = − + =   + = − = − =       − ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ Section 3.3 1. 82% or more of the scores were at or below her score. 100% − 82% = 18% or fewer of the scores were above her score. 2. The upper quartile is the 75th percentile. Therefore, the minimum percentile rank must be the 75th percentile. 3. No, the score 82 might have a percentile rank less than 70. Raw scores are not necessarily equal to percentile scores. 4. Timothy performed better because a percentile rank of 72 is greater than a percentile rank of 70. 5. Order the data from smallest to largest. Lowest value 2 Highest value 42 = = There are 20 data values. 23 23 Median 23 2 + = = There are 10 values less than the Q 2 position and 10 values greater than the Q 2 position. 1 3 3 1 8 11 9.5 2 28 29 28.5 2 28.5 9.5 19 Q Q IQR Q Q + = = + = = = − = − = Part IV: Complete Solutions, Chapter 3 273 Copyright © Houghton Mifflin Company. All rights reserved. M o n t h s 60 50 40 30 20 10 0 Boxpl ot of Mont hs f or Nur ses 6. (a) Order the data from smallest to largest. Lowest value 3 Highest value 72 = = There are 20 data values. 22 24 Median 23 2 + = = There are 10 values less than the median and 10 values greater than the median. 1 3 3 1 15 17 16 2 29 31 30 2 30 16 14 Q Q IQR Q Q + = = + = = = − = − = M o n t h s C l e r i c a l 40 30 20 10 0 Boxpl ot of Mont hs Cl er i cal 274 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. (b) The median for nurses and clerical workers is 23 months. The upper half of the data for the nurses falls between values of 23 and 42 months, whereas the upper half of the data for the clerical workers falls between 23 and 72 months. The distance between Q 3 and the maximum for nurses is 13.5 months; for clerical workers, this distance is 42 months. The distance between Q 1 and the minimum for nurses is 7.5 months; for clerical workers, this distance is 13 months. 7. (a) Lowest value 17 Highest value 38 = = There are 50 data values. 24 24 Median 24 2 + = = There are 25 values above and 25 values below the Q 2 position. 1 3 22 27 27 22 5 Q Q IQR = = = − = C o l l e g e G r a d u a t e s 35 30 25 20 15 Boxpl ot of Col l ege Gr aduat es (b) 26% is in the third quartile because it is between the median and Q 3 . 8. (a) Lowest value 5 Highest value 15 = = There are 50 data values. 10 10 Median 10 2 + = = There are 25 values above and 25 values below the Q 2 position. 1 3 9 12 12 9 3 Q Q IQR = = = − = Part IV: Complete Solutions, Chapter 3 275 Copyright © Houghton Mifflin Company. All rights reserved. H i g h S c h o o l D r o p o u t s 15.0 12.5 10.0 7.5 5.0 Boxpl ot of Hi gh School Dr opout s (b) 7% is in the first quartile because it is below Q 1 . 9. (a) California has the lowest premium, and Pennsylvania has the highest. (b) Pennsylvania has the highest median premium. (c ) California has the smallest range, and Texas has the smallest IQR. (d) The smallest IQR will be Texas. The largest IQR will be Pennsylvania. For figure (a), IQR = 3,652 – 2,758 = 894 For figure (b), IQR = 5,801 – 4,326 = 1,475 For figure (c), IQR = 3,966 – 2,801 = 1,165 Therefore, figure (a) is Texas and figure (b) is Pennsylvania. By elimination, figure (c) is California. 10. (a) Order the data from smallest to largest. Lowest value 4 Highest value 80 = = There are 24 data values. 65 66 Median 65.5 2 + = = There are 12 values above and 12 values below the median. 1 3 61 62 61.5 2 71 72 71.5 2 Q Q + = = + = = 276 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. H e i g h t s 80 70 60 50 40 30 20 10 0 Boxpl ot of Hei ght s (b) 3 1 71.5 61.5 10 IQR Q Q = − = − = (c) ( ) ( ) ( ) 1 3 1.5 10 15 Lower limit: 1.5 61.5 15 46.5 Upper limit: 1.5 71.5 15 86.5 Q IQR Q IQR = − = − = + = + = (d) Yes, the value 4 is below the lower limit and so is an outlier; it is probably an error. Chapter 3 Review 1. (a) The variance and the standard deviation (b) Box-and-whisker plot 2. (a) For (i), the mode is the tallest bar, namely, 7; the median and mean are estimated to be 7. For (ii), the mode = median = mean = 7. (b) Distribution (i) will have a larger standard deviation because more data are in the tails. This is indicated by the tall bars at values of 4 and 10. 3. (a) For both data sets, the mean is 20. Also, for both data sets, the range = maximum – minimum = 31 – 7 = 24. (b) Data set C1 seems more symmetric because the mean equals the median and the median is centered in the interquartile range. (c) For C1, IQR = 25 – 15 = 10. For C2, IQR = 22 – 20 = 2. Thus, for C1, the middle 50% of the data have a range of 10, whereas for C2, the middle 50% of the data have a smaller range of 2. 4. (a) Mean = 1.9 2.8 7.2 8 36.2 8 4.525 x x n Σ + + + = = = =  Part IV: Complete Solutions, Chapter 3 277 Copyright © Houghton Mifflin Company. All rights reserved. Order the data from smallest to largest. 1.9 1.9 2.8 3.9 4.2 5.7 7.2 8.6 Median = 3.9 4.2 4.05 2 + = The mode is 1.9 because it is the value that occurs most frequently. (b) ( ) 2 42.395 2.46 1 7 2.46 100 100 54.4% 4.525 x x s n s CV x ∑ − = = ≈ − = = ≈ ⋅ ⋅ Range = 8.6 1.9 6.7 − = 5. (a) Lowest value 31 Highest value 68 = = There are 60 data values. 45 45 Median 45 2 + = = There are 30 values above and 30 values below the Q 2 position. 1 3 40 40 40 2 52 53 52.5 2 52.5 40 12.5 Q Q IQR + = = + = = = − = G e o r g i a D e m o c r a t s 70 60 50 40 30 Boxpl ot of Per cent age of Geor gi a Democr at s by Count y 278 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. (b) Class width = 8 Class Midpoint x f xf 2 x f 31–38 34.5 11 379.5 13,092.8 39–46 42.5 24 1020 43,350.0 47–54 50.5 15 757.5 38,253.8 55–62 58.5 7 409.5 23,955.8 63–70 66.5 3 199.5 13,266.8 60 n f = ∑ = 2, 766 xf ∑ = 2 131,919 x f ∑ = ( ) 2 2 (2,766) 2 60 2,766 46.1 60 131, 919 4,406.4 8.64 1 59 59 xf n xf x n x f s n ∑ ∑ = = = ∑ − − = = = ≈ − ( ) ( ) 2 46.1 2 8.64 28.82 2 46.1 2 8.64 63.38 x s x s − = − = + = + = We expect at least 75% of the counties in Georgia to have between 28.82% and 63.38% Democrats. (c) 46.15, 8.63 x s = ≈ 6. (a) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Weighted average = 92 0.05 73 0.08 81 0.08 85 0.15 87 0.15 83 0.15 90 0.34 0.05 0.08 0.08 0.15 0.15 0.15 0.34 85.77 1 85.77 xw w ∑ ∑ + + + + + + = + + + + + + = = (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Weighted average = 20 0.05 73 0.08 81 0.08 85 0.15 87 0.15 83 0.15 90 0.34 1 82.17 xw w ∑ ∑ + + + + + + = = 7. 2, 500 Mean weight 156.25 16 = = The mean weight is 156.25 lb. 8. (a) Lowest value 7.8 Highest value 29.5 = = There are 72 data values. 20.2 20.3 Median 20.25 2 + = = There are 36 values above and 36 values below the Q 2 position. Part IV: Complete Solutions, Chapter 3 279 Copyright © Houghton Mifflin Company. All rights reserved. 1 3 14.0 14.4 14.2 2 23.8 23.8 23.8 2 Q Q + = = + = = (b) IRQ = 23.8 − 14.2 = 9.6 kilograms (c) K i l o g r a m s 30 25 20 15 10 Boxpl ot of Ki l ogr ams (d) The median is closer to the maximum value, indicating that the higher weights are more concentrated than the lower weights. The lower whisker is also longer than the upper, which emphasizes again skewness toward the lower values. Yes, the lower half shows slightly more spread, indicating skewness to the left (low). 9. (a) A college degree does not guarantee an increase of 83.4% in earnings compared with a high-school diploma. This statement is based on averages. (b) We compute as follows: 2 $51, 206 2($8, 500) $34, 206 2 $51, 206 2($8, 500) $68, 206 x s x s − = − = + = + = (c) (0.46)(4, 500) (0.21)(7, 500) (0.07)(12, 000) (0.08)(18, 000) (0.09)(24, 000) (0.09)(31, 000) 0.46 0.21 0.07 0.08 0.09 0.09 $10, 875 x x + + + + + = = + + + + + = 10. (a) Order the data from smallest to largest. Lowest value 6 Highest value 16 = = There are 50 data values. 11 11 Median 11 2 + = = 280 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. There are 25 values above and 25 values below the Q 2 position. 1 3 3 1 10 13 13 10 3 Q Q IQR Q Q = = = − = − = S o i l W a t e r C o n t e n t 17.5 15.0 12.5 10.0 7.5 5.0 Boxpl ot of Soi l Wat er Cont ent (b) Class Midpoint x f xf 2 x f 6–8 7 4 28 196 9–11 10 24 240 2,400 12–14 13 15 195 2,535 15–17 16 7 112 1,792 50 n f = ∑ = 575 xf ∑ = 2 6,923 x f ∑ = ( ) 2 2 (575) 2 50 575 11.5 50 6, 923 310.5 2.52 1 49 49 xf n xf x n x f s n ∑ = = = ∑ − − = = ≈ ≈ − ∑ ( ) ( ) 2 11.5 2 2.52 6.46 2 11.5 2 2.52 16.54 x s x s − = − = + = + = We expect at least 75% of the soil water content measurements to fall in the interval 6.46–16.54. (c) Using a TI-83, 11.48; 2.44 x s ≈ ≈ Part IV: Complete Solutions, Chapter 3 281 Copyright © Houghton Mifflin Company. All rights reserved. 11. ( ) ( ) ( ) ( ) ( ) Weighted average = 5 2 8 3 7 3 9 5 7 3 2 3 3 5 3 121 16 7.56 xw w ∑ ∑ + + + + = + + + + = ≈ Cumulative Review Problems Chapters 1, 2, 3 1. (a) Median, percentile (b) Mean, variance, standard deviation 2. (a) Gap between first bar and rest of bars or between last bar and rest of bars (b) Large gap between data on far left side or far right side and rest of data (c) Several empty stems after stem including lowest values or before stem including highest values (d) Data beyond fences placed at Q 1 – 1.5(IQR) and at Q 3 + 1.5(IQR). 3. (a) Same (b) Set B has higher mean. (c) Set B has higher standard deviation. (d) Set B has much longer whisker beyond Q 3. 4. (a ) In Set A, 86 is the relatively higher score because a larger percentage of scores fall below it. (b) In Set B because 86 is more standard deviations above the mean 5. One could assign a consecutive number to each well in West Texas and then use a random-number table or a computer package to draw the simple random sample. 6. The pH levels are ratios because the values can be multiplied. Also, 0 pH is meaningful and not just a place on the scale. 7. Use the one’s digit for the stem and the tenths decimal for the leaves. Split each stem into five rows. Here, 7 0 = 7.0. 7 000000001111111111 7 222222222233333333333 7 44444444455555555 7 666666666777777 7 8888899999 8 01111111 8 2222222 8 45 8 67 8 88 282 Part IV: Complete Solutions, Chapter 3 Copyright © Houghton Mifflin Company. All rights reserved. 8. Class Limits Class Boundaries Midpoints Frequency Relative Frequency Cumulative Frequency 7.0–7.3 6.95–7.35 7.15 39 0.382 0.382 7.4–7.7 7.35–7.75 7.55 32 0.314 0.696 7.8–8.1 7.75–8.15 7.95 18 0.176 0.872 8.2–8.5 8.15–8.55 8.35 9 0.088 0.960 8.6–8.9 8.55–8.95 8.75 4 0.039 0.999 F r e q u e n c y 8.95 8.55 8.15 7.75 7.35 6.95 40 30 20 10 0 Hi st ogr am of pH Level R e l a t i v e F r e q u e n c y 8.95 8.55 8.15 7.75 7.35 6.95 40 30 20 10 0 Hi st ogr am of pH Level To construct the frequency polygon, draw a dot at the minimum class boundary, at each midpoint, and at the maximum class boundary. Then connect the dots. 9. To draw the ogive, the vertical axis is labeled with relative frequency, and the horizontal axis is labeled with the upper class boundaries. Draw a dot at the minimum class boundary and zero, and then draw a dot at each upper class boundary and the corresponding cumulative frequency. Connect the dots. 10. Range = 8.8 – 7.0 = 1.8 7.0 7.0 ... 8.8 7.58 102 7.5 7.5 Median 7.5 Mode 7.3 2 x x n + + + = = = + = = = ∑ 11. (a) The students can verify the figures using a calculator or a statistics package. (b) ( ) 2 2 2 0.1984 1 0.1984 0.4454 0.4454 0.59 5.9% 7.58 x x s n s s s CV x − = = − = = = = = = = ∑ The sample variance is only 5.9% of the mean. This appears to be small. Part IV: Complete Solutions, Chapter 3 283 Copyright © Houghton Mifflin Company. All rights reserved. 12. 2( ) 7.58 2(0.4454) 6.69 2( ) 7.58 2(0.4454) 8.47 x s x s − = − = + = + = Thus 75% of all pH levels are found between 6.69 and 8.47. 13. We know the minimum value is 7.0, the maximum value is 8.8, and the median is 7.5. Using Minitab, we find that Q 1 = 7.2 and Q 3 = 7.9. Thus IQR = 7.9 – 7.2 = 0.7. p H 9.0 8.5 8.0 7.5 7.0 Boxpl ot of pH Level s f or West Texas 14. The histogram shows that the distribution is skewed right. Lower values are more common because the height of the bars is higher. 15. 87.2% of the wells have a pH of less than 8.15. 57.8% of the wells could be used for the irrigation. Here, 57.8% = 31.4% + 17.6% + 8.8%. 16. There do not appear to be any outliers because there are no large gaps in the data set. Eight are neutral. 17. Half the wells are found to have a pH between 7.2 and 7.9. There is skewness toward the high values, with half the wells having a pH between 7.5 and 8.8. The boxplot and the histogram are consistent because both show the distribution to be right skewed. 18. Answers will vary. Good reports will include the preceding graphs, measures of center, measures of variation, and a comment about any unusual features. 296 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. Chapter 4: Elementary Probability Theory Section 4.1 1. Equally likely outcomes, relat ive frequency, intuition 2. The complement is “not rain today.” This probability is 100% – 30% = 70%. 3. (a) The probability of a certain event is 1. (b) The probability of an impossible event is 0. 4. The law of large numbers states that in the long run, as the sample size or number of trials increases, the relative frequency of outcomes approaches the theoretical probability of the outcome. Five hundred trials are better because the law of large numbers works better for larger samples. 5. No. The probability of throwing tails on the second toss is 0.50 regardless of the outcome on the first toss. 6. (a) Probabilities must be between 0 and 1 inclusive. –0.41 < 0 (b) Probabilities must be between 0 and 1 inclusive. 1.21 > 1 (c) 120% = 1.20, and probabilit ies must be between 0 and 1 inclusive. 1.20 > 1 (d) Yes, 0 ≤ 0.56 ≤ 1. 7. The resulting relat ive frequency can be used as an estimate of the true probability of all Americans who can wiggle their ears. 8. The resulting relat ive frequency can be used as an estimate of the true probability of all Americans who can raise one eyebrow. 9. (a) P(no similar preferences) 15 71 124 131 34 (0) , (1) , (2) , (3) , (4) 375 375 375 375 375 P P P P P = = = = = = (b) 15 71 124 131 34 375 1, 375 375 + + + + = = yes Personality types were classified into four main preferences; all possible numbers of shared preferences were considered. The sample space is 0, 1, 2, 3, and 4 shared preferences. 10. (a) The sample space would be 1, 2, 3, 4, 5, and 6 dots. If the die is fair, all outcomes will be equally likely. (b) 1 (1) (2) (3) (4) (5) (6) 6 P P P P P P = = = = = = because the die faces are equally likely, and there are six outcomes. The probabilities should and do add to 1 1 1 1 1 1 1 6 1 6 6 6 6 6 6 6   + + + + + = =     because all possible outcomes have been considered. (c) P(number of dots < 5) = P(1 or 2 or 3 or 4 dots) = P(1) + P (2) + P(3) + P(4) 1 1 1 1 6 6 6 6 = + + + 4 2 6 3 = = or P(dots < 5) = 1 – P(5 or 6 dots) 1 2 1 3 3 = − = (The applicable probability rule used here will be discussed in the next section of the text; rely on your common sense for now.) (d) Complementary event rule: P(A) = 1 – P(not A) P(5 or 6 dots) = 1 – P(1 or 2 or 3 or 4 dots) 2 1 1 , 3 3 = − = or P(5 or 6) = P(5) + P(6) 1 1 2 1 6 6 6 3 = + = = Part IV: Complete Solutions, Chapter 4 297 Copyright © Houghton Mifflin Company. All right s reserved. 11. (a) Note: “Includes the left limit but not the right limit” means 6 A.M. ≤ t ime t < noon, noon ≤ t < 6 P.M., 6 P.M. ≤ t < midnight, midnight ≤ t < 6 A.M. P(best idea 6 A.M.–12 noon) 290 0.30 966 = ≈ P(best idea 12 noon–6 P.M.) 135 0.14 966 = ≈ P(best idea 6 P.M.–12 midnight) 319 0.33 966 ≈ P(best idea from 12 midnight to 6 A.M.) 222 0.23 966 = ≈ (b) The probabilit ies add up to 1. They should add up to 1 provided that the intervals do not overlap and each inventor chose only one interval. The sample space is the set of four time intervals. 12. (a) P(germinate) = number germinated 2, 430 0.81 number planted 3, 000 = = (b) P(not germinate) = 3, 000 2, 430 570 0.19 3, 000 3, 000 − = = (c) The sample space is two outcomes, germinate and not germinate. P(germinate) + P(not germinate) = 0.81 + 0.19 = 1 The probabilit ies of all the outcomes in the sample space should and do sum to 1. (d) No because P(germinate) = 0.81 ≠ P(not germinate) = 0.19 If they were equally likely, each would have probability 1 0.5. 2 = 13. (a) Given: Odds in favor of A are n:m i.e., n m       . Show: ( ) n P A m n = + Proof: Odds in favor of A are ( ) (not ) P A P A by definition (not ) 1 ( ) complementary events ( ) ( ) substitution (not ) 1 ( ) [1 ( )] [ ( )] cross multiply [ ( )] [ ( )] [ ( )] [ ( )] ( )[ ( )] P A P A n P A P A m P A P A n P A m P A n n P A m P A n n P A m P A n n m P A = − = = − − = − = = + = + So ( ), n P A n m = + as was to be shown. (b) Odds of a successful call are 2 to 15. Now 2 to 15 can be written as 2:15 or 2 . 15 From part (a): if the odds are 2:15 (let n = 2, m = 15), then P(sale) 2 2 15 n n m = = + + 2 0.118. 17 = ≈ 298 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. (c) Odds of free throw are 3 to 5, i.e., 3:5. Let n = 3 and m = 5 here; then, from part (a): P(free throw) 3 3 0.375 3 5 8 n n m = = = = + + 14. (a) Given: Odds against W are a:b or . a b       Show: P(not W) . a a b = + Proof: Odds against W are (not ) ( ) P W P W by definition. ( ) 1 (not ) complementary events (not ) substitution ( ) (not ) substitution 1 (not ) [ (not )] [1 (not )] P W P W P W a P W b P W a P W b b P W a P W = − = = − = − cross-multiply [ (not )] [ (not )] [ ( )] [ (not )] ( )[ (not )] (not ) b P W a a P W b P not W a P W a a b P W a a P W a b = − + = + = = + (not ) a P W a b = + , as was to be shown. (b) Point Given’s betting odds are 9:5. Betting odds are based on the probability that the horse does not win, so odds against Point Given (PG) winning are (not PG wins) . (PG wins) P P Let a = 9 and b = 5 in part (a) formula. From part (a), P(not PG wins) 9 9 , 9 5 14 a a b = = = + + but event “not PG wins” is the same as “PG loses,” so P(PG loses) 9 0.64, 14 = ≈ and P(PG wins) 9 5 1 0.36. 14 14 = − = ≈ (c) Betting odds for Monarchos are 6:1. Betting odds are based on the probability that the horse does not win; i.e., the horse loses. Let W be the event that Monarchos wins. From part (a), if the events against W are given as a:b, the P(not W) . a a b = + Let a = 6 and b = 1 in the part (a) formula, so Part IV: Complete Solutions, Chapter 4 299 Copyright © Houghton Mifflin Company. All right s reserved. 6 6 (not ) 6 1 7 6 (not ) (Monarchos loses)= 0.86 7 P W P W P = = + = ≈ (Monarchos wins) ( ) 1 (not ) 6 1 1 0.14 7 7 P P W P W = = − = − = ≈ (d) Invisible Ink was given betting odds of 30 to 1; i.e., odds against Invisible Ink winning were 30 1 . Let W denote the event that Invisible Ink wins. Let a = 30, b = 1 in formula from part (a). Then, from part (a), P(not W) 30 30 , (not Invisible Ink wins) ; 30 1 31 a P a b = = = + + i.e., 30 (Invisible Ink loses) 0.97 31 (Invisible Ink wins) 1 (Invisible Ink loses) 30 1 1 0.03 31 31 P P P = ≈ = − = − = ≈ 15. One approach is to make a table showing the information about the 127 people who walked by the store. Buy Did Not Buy Row Total Came into the store 25 58 – 25 = 33 58 Did not come in 0 69 127 – 58 = 69 Column total 25 102 127 If 58 came in, 69 didn’t; 25 of the 58 bought something, so 33 came in but didn’t buy anything. Those who did not come in couldn’t buy anything. The row entries must sum to the row totals, the column entries must sum to the column totals, and the row totals, as well as the column totals, must sum to the overall total, i.e., the 127 people who walked by the store. Also, the four inner cells must sum to the overall total: 25 + 33 + 0 + 69 = 127. This kind of problem relies on formula (2), number outcomes favorable to (event ) . total number of outcomes A P A = (a) 58 ( ) 0.46 127 P A = ≈ Here, we divide by 127 people. (b) 25 ( ) 0.43 58 P A = ≈ Here, we divide by 58 people (only those who entered). (c) 58 25 25 ( ) (Enter and buy) 0.20 127 58 127 P A P = = × = ≈ Or similarly, read from the table that 25 people both entered and bought something. Divide this by the total number of people, namely, 127. (d) 33 ( ) (Buy nothing) 0.57 58 P A P = = ≈ Here, we divide by 58 people. 300 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. Section 4.2 1. No. Mutually exclusive events cannot occur at the same time. 2. If A and B are independent, then P(A) = P(A | B). Therefore, P(A | B) = 0.3. 3. (a) Event A cannot occur if event B has occurred. Therefore, P(A | B) = 0. (b) Since we are told that P(A) ≠ 0, and we have determined that P(A | B) = 0, we can deduce that P(A) ≠ P(A | B). Therefore, events A and B are not independent. 4. (a) P(A and B) = P(A) × P(B) if events A and B are independent. This product can equal zero only if either P(A) = 0 or P(B) = 0 (or both). We are told that P(A) ≠ 0 and that P(B) ≠ 0. Therefore, P(A and B) ≠ 0. (b) By the preceding line, the definition of mutually exclusive events is violated. Thus A and B are not mutually exclusive. 5. (a) P(A and B) (b) P(B | A) (c) P(A c | B) (d) P(A or B) (e) P(A or B c ) 6. (a) P(A c or B) (b) P(B | A) (c) P(A | B) (d) P(A and B c ) (e) P(A and B) 7. (a) Green and blue are mutually exclusive because each M&M candy is only one color. P(green or blue) = P(green) + P(blue) = 10% + 10% = 20% = 0.20. (b) Yellow and red are mutually exclusive once again because each candy is only one color. P(yellow or red) = P(yellow) + P(red) = 20% + 20% = 40% = 0.40. (c) Use the complementary event. P(not purple) = 1 – P(purple) = 1 – 0.20 = 0.80 = 80% 8. The total number of arches tabled is 288. Arch heights are mutually exclusive. (a) P(3 to 9 feet) 111 288 = (b) P(30 feet or taller) = P(30 to 49) + P(50 to 74) + P(75 and higher) 30 33 18 81 288 288 288 288 = + + = (c) P(3 to 49 feet) = P(3 to 9) + P(10 to 29) + P(30 to 49) 111 96 30 237 288 288 288 288 = + + = (d) P(10 to 74 feet) = P(10 to 29) + P(30 to 49) + P(50 to 74) 96 30 33 159 288 288 288 288 = + + = (e) P(75 feet or taller) 18 288 = Hint: For Problems 9–12, refer to Figure 4-2 if necessary. Think of the outcomes as an (x, y) ordered pair. 9. (a) Yes, the outcome of the red die does not influence the outcome of the green die. (b) P(5 on green and 3 on red) = P(5 on green) · P(3 on red) 1 1 1 0.028 6 6 36    = = ≈       . Part IV: Complete Solutions, Chapter 4 301 Copyright © Houghton Mifflin Company. All right s reserved. (c) P(3 on green and 5 on red) = P(3 on green) · P(5 on red) 1 1 1 0.028 6 6 36    = = ≈       (d) P[(5 on green and 3 on red) or (3 on green and 5 on red)] = P(5 on green and 3 on red) + P(3 on green and 5 on red) = 1 1 2 1 0.056 36 36 36 18 + = = ≈ (because they are mutually exclusive outcomes). 10. (a) Yes. (b) P(1 on green and 2 on red) = P(1 on green) · P(2 on red) 1 1 1 6 6 36    = =       (c) P(2 on green and 1 on red) = P(2 on green) · P(1 on red) 1 1 1 6 6 36    = =       (d) P[(1 on green and 2 on red) or (2 on green and 1 on red)] = P(1 on green and 2 on red) + P(2 on green and 1 on red) = 1 1 2 1 36 36 36 18 + = = (because they are mutually exclusive outcomes). 11. (a) We can obtain a sum of 6 as follows: 1 + 5 = 6 2 + 4 = 6 3 + 3 = 6 4 + 2 = 6 5 + 1 = 6 (sum 6) [(1, 5) or (2, 4) or (3 on red, 3 on green) or (4, 2) or (5, 1)] (1, 5) (2, 4) (3, 3) (4, 2) (5, 1) because the (red, green) outcomes are mutually exclusive 1 1 1 1 1 6 6 6 6 6 P P P P P P P = = = + + + +         = + +               1 1 1 1 1 6 6 6 6 6 because the red die outcome is independent of the green die outcome 1 1 1 1 1 5 36 36 36 36 36 36         + +                 = + + + + = (b) We can obtain a sum of 4 as follows: 1 + 3 = 4 2 + 2 = 4 3 + 1 = 4 (sum is 4) [(1, 3) or (2, 2) or (3, 1)] (1, 3) (2, 2) (3, 1) because the (red, green) outcomes are mutually exclusive 1 1 1 1 1 1 6 6 6 6 6 6 because the red die outcome is inde P P P P P = = + +          = + +                   pendent of the green die outcome 1 1 1 3 1 36 36 36 36 12 = + + = = (c) You cannot roll a sum of 6 and a sum of 4 at the same time. These are mutually exclusive events. P(sum of 6 or 4) = P(sum of 6) + P(sum of 4) = 5 3 8 2 36 36 36 9 + = = 302 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 12. (a) We can obtain a sum of 7 as follows: 1 + 6 = 7 2 + 5 = 7 3 + 4 = 7 4 + 3 = 7 5 + 2 = 7 6 + 1 = 7 (sum is 7) [(1, 6) or (2, 5) or (3, 4) or (4, 3) or (5, 2) or (6, 1)] (1, 6) (2, 5) (3, 4) (4, 3) (5, 2) (6, 1) because the (red, green) outcomes are mutually exclusive 1 1 1 1 1 6 6 6 6 6 P P P P P P P P = = + + + + +        = + +             1 1 1 1 1 1 1 6 6 6 6 6 6 6 because the red die outcome is independent of the green die outcome 1 1 1 1 1 1 6 1 36 36 36 36 36 36 36 6            + + +                         = + + + + + = = (b) We can obtain a sum of 11 as follows: 5 + 6 = 11 or 6 + 5 = 11 (sum is 11) [(5, 6) or (6, 5)] (5, 6) (6, 5) because the (red, green) outcomes are mutually exclusive 1 1 1 1 6 6 6 6 because the red die outcome is independent of the green die outcome P P P P = = +       = +             1 1 2 1 36 36 36 18 = + = = (c) You cannot roll a sum of 7 and a sum of 11 at the same t ime. These are mutually exclusive events. P(sum is 7 or 11) = P(sum is 7) + P(sum is 11) = 6 2 8 2 36 36 36 9 + = = 13. (a) No, the draws are not independent. The key idea is “without replacement” because the probability of the second card drawn depends on the first card drawn. Let the card draws be represented by an (x, y) ordered pair. For example, (K, 6) means the first card drawn was a king and the second card drawn was a 6. Here the order of the cards is important. (b) P(ace on first draw and king on second draw) = P(ace, king) 4 4 16 4 52 51 2, 652 663    = = =       There are four aces and fpour kings in the deck. Once the first card is drawn and not replaced, there are only 51 cards left to draw from, but all the kings are available. (c) P(king, ace) 4 4 16 4 52 51 2652 663    = = =       (d) P(ace and king in either order) = P[(ace, king) or (king, ace)] = P(ace, king) + P(king, ace) because these two outcomes are mutually exclusive = 16 16 32 8 2, 652 2, 652 2, 652 663 + = = Part IV: Complete Solutions, Chapter 4 303 Copyright © Houghton Mifflin Company. All right s reserved. 14. (a) No, the draws are not independent. The key idea is “without replacement” because the probability of the second card drawn depends on the first card drawn. Let the card draws be represented by an (x, y) ordered pair. For example, (K, 6) means the first card drawn was a king and the second card drawn was a 6. Here the order of the cards is important. (b) (3, 10) [(3 on 1st) and (10 on 2nd, given 3 on 1st)] (3 on 1st) (10 on 2nd, given 3 on 1st) 4 4 16 4 0.006 52 51 2, 652 663 P P P P = = ⋅    = = = ≈       (c) (10, 3) [(10 on 1st) and (3 on 2nd, given 10 on 1st)] (10 on 1st) (3 on 2nd, given 10 on 1st) 4 4 16 4 0.006 52 51 2, 652 663 P P P P = = ⋅    = = = ≈       (d) P[(3, 10) or (10, 3)] = P(3, 10) + P(10, 3) because these two outcomes are mutually exclusive. = 4 4 8 0.012 663 663 663 + = ≈ 15. (a) Yes, the draws are independent. The key idea is “with replacement.” When the first card drawn is replaced, the sample space is the same for the second card as it was for the first card. In fact, it is possible to draw the same card t wice. Let the card draws be represented by an (x, y) ordered pair; for example, (K, 6) means a king was drawn, replaced, and then the second card, a 6, was drawn. (b) (A, K) (A) (K) because they are independent. 4 4 16 1 52 52 2, 704 169 P P P = ⋅    = = =       (c) (K, A) (K) (A) because they are independent. 4 4 16 1 52 52 2, 704 169 P P P = ⋅    = = =       (d) P[(A, K) or (K, A)] = P(A, K) + P(K, A) because the two outcomes are mutually exclusive. = 1 1 2 169 169 169 + = 16. (a) Yes, the draws are independent. The key idea is “with replacement.” When the first card drawn is replaced, the sample space is the same for the second card as it was for the first card. In fact, it is possible to draw the same card t wice. Let the card draws be represented by an (x, y) ordered pair; for example, (K, 6) means a king was drawn, replaced, and then the second card, a 6, was drawn. (b) (3, 10) (3) (10) because draws are independent. 4 4 16 1 0.0059 52 52 2, 704 169 P P P = ⋅    = = = ≈       (c) (10, 3) (10) (3) because of independence. 4 4 16 1 0.0059 52 52 2, 704 169 P P P = ⋅    = = = ≈       (d) P[(3, 10) or (10, 3)] = P(3, 10) + P(10, 3) because the two outcomes are mutually exclusive. = 1 1 2 0.0118 169 169 169 + = ≈ 304 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 17. (a) P(6 years old or older) = 27% + 14% + 22% = 63% (b) P(12 years old or younger) = 1 – P(13 years old or older) = 100% – 22% = 78% (c) P(Bet ween 6 and 12 years old) = 27% + 14% = 41% (d) P(Bet ween 2 and 9 years old) = 22% + 27% = 49% The 13-and-older category may include children up to 17 or 18 years old. This is a larger category. 18. Let S denote “senior.” Let F denote “got the flu.” We are given the following probabilit ies: P(F | S) = 0.14 P(F | S c ) = 0.24 P(S) = 0.125 P(S c ) = 0.875 (a) P(S and F) = P(S) × P(F | S) = (0.125) × (0.14) = 0.0175 (b) P(S c and F) = P(S c ) × P(F | S c ) = (0.875) × (0.24) = 0.21 (c) Here, P(S) = 0.95, so P(S c ) = 1 – 0.95 = 0.05 (a) P(S and F) = P(S) × P(F | S) = (0.95) × (0.14) = 0.133 (b) P(S c and F) = P(S c ) × P(F | S c ) = (0.05) × (0.24) = 0.012 (d) Here, P(S) = P(S c ) = 0.50. (a) P(S and F) = P(S) × P(F | S) = (0.50) × (0.14) = 0.07 (b) P(S c and F) = P(S c ) × P(F | S c ) = (0.50) × (0.24) = 0.12 19. Let T denote “telling the truth.” Let L denote “machine catches a person lying.” We are given the following probabilit ies: P(L | T c ) = 0.72 P(L | T) = 0.07 (a) Given P(T) = 0.90. Then P(T and L) = P(T) × P(L | T) = (0.90) × (0.07) = 0.063 (b) Given P(T c ) = 0.10. Then P(T c and L) = P(T c ) × P(L | T c ) = (0.10) × (0.72) = 0.072 (c) Given P(T) = P(T c ) = 0.50. Then P(T and L) = (0.50) × (0.07) = 0.035 P(T c and L) = (0.50) × (0.72) = 0.36 (d) Given P(T) = 0.15 and P(T c ) = 0.85. Then P(T and L) = (0.15) × (0.07) = 0.0105 P(T c and L) = (0.85) × (0.72) = 0.612 20. (a) We want to solve for P(T c ). There are t wo possibilities when the polygraph says that the person is lying: Either the polygraph is right, or the polygraph is wrong. If the polygraph is right, the polygraph results show “lying,” and the person is not telling the truth; i.e., P(L and not T). If the polygraph is wrong, then the polygraph results show “lying,” but in fact, the person is telling the truth; i.e., P(L and T). P(L) = P(L and T c ) + P(L and T) = [P(T c ) × P(L | T c )] + [P(T) × P(L | T)] = [P(T c ) × P(L | T c )] + {[1 – P(T c )] × P(L | T)} Part IV: Complete Solutions, Chapter 4 305 Copyright © Houghton Mifflin Company. All right s reserved. We are told that P(L) = 0.30, so 0.30 = [P(T c ) × P(L | T c )] + {[1 – P(T c )] × P(L | T)} (**) = [P(T c ) × 0.72 ] + {[1 – P(T c )] × 0.07} = (0.72) × P(T c ) + {0.07 – [0.07 × P(T c )]} 0.23 = P(T c ) × (0.72 – 0.07) = P(T c ) × (0.65) 0.23/ 0.65 = P(T c ) = 0.354 = 35.4% (b) Here, P(L) = 70% = 0.70. Replace the 0.30 with 0.70 in (**) and solve. P(T c ) = 0.63/0.65 = 0.969 21. (a) P(S) 686 1,160 = P(S | A) 270 580 = P(S | Pa) 416 580 = (b) No, they are not independent. P(S | Pa) ≠ P(S) based on the previous part. (c) P(A and S) = 270/ 1,160 using the table. P(Pa and S) = 416/1,160 using the table. (d) P(N) 474 1,160 = P(N | A) 310 580 = (e) No, they are not independent. P(N | A) ≠ P(N) based on the preceding part. (f) ( or ) ( ) ( ) ( and ) 580 686 270 996 1,160 1,160 1,160 1,160 P A S P A P S P A S = + − = + − = 22. (a) P(+ | condit ion present) 110 130 = (b) P(– |condition present) 20 130 = (c) P(– | condition absent) = 50 70 (d) P(+ | condit ion absent) 20 70 = (e) P(condition present and +) = P(condition present) × P(+ | condition present) 130 110 110 200 130 200    = =       (f) P(condition present and –) = P(condition present) × P(– | condition present) 130 20 20 200 130 200    = =       306 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 23. Let C denote the presence of the condition and not C denote absence of the condition. (a) P(+ | C) 72 154 = (b) P(– | C) 82 154 = (c) P(– | not C) 79 116 = (d) P(+ | not C) 37 116 = (e) P(C and +) = P(C) × P(+ | C) 154 72 72 270 154 270    = =       (f) P(C and –) = P(C) × P(– | C) 154 82 82 270 154 270    = =       24. (a) P(10 to 14 years) = 291 2008 (b) P(10 to 14 years | East) = 77 452 (c) P(at least 10 years) = 291 535 826 2008 2008 + = (d) P(at least 10 years | East) = 45 86 131 373 373 + = (e) P(West | less than 1 year) = 41 157 (f) P(South | less than 1 year) = 53 157 (g) P(1 or more years | East) = 1 – P(less than 1 year | East) = 32 420 1 452 452 − = (h) P(1 or more years | West) = 1 – P(less than 1 year | West) = 41 332 1 373 373 − = (i) We can check if P(East) = P(15 or more years | East). If these probabilities are equal, then the events are independent. 452 118 (East) 0.225 (15 years | East) 0.261 2008 452 P P = = + = = Since the probabilities are not equal, the events are not independent. Part IV: Complete Solutions, Chapter 4 307 Copyright © Houghton Mifflin Company. All right s reserved. 25. Given: Let A be the event that a new store grosses > $940,000 in year 1; then A c is the event the new store grosses ≤ $940,000 the first year. Let B be the event that the store grosses > $940,000 in the second year; then B c is the event the store grosses ≤ $940,000 in the second year of operation. 2-Year Results Translations A and B Profitable both years A and B c Profitable first but not second year A c and B Profitable second but not first year A c and B c Not profitable either year P(A) = 65% (show profit in first year) P(A c ) = 35% P(B) = 71% (show profit in second year) P(B c ) = 29% P(close) = P(A c and B c ) P(B, given A) = 87% (a) P(A) = 65% = 0.65 (b) P(B) = 71% = 0.71 (c) P(B | A) = 87% = 0.87 (d) P(A and B) = P(A) × P(B | A) = (0.65)(0.87) = 0.5655 ≈ 0.57 (e) P(A or B) = P(A) + P(B) – P(A and B) = 0.65 + 0.71 – 0.57 = 0.79 (f) P(not closed) = P(show a profit in year 1 or year 2 or both) = 0.79 P(closed) = 1 – P(not closed) = 1 – 0.79 = 0.21 26. P(female) = 85%, so P(male) = 15% P(BSN | female) = 70% P(BSN | male) = 90% (a) P(BSN | female) = 70% = 0.70 (b) P(BSN and female) = P(female) × P(BSN | female) = (0.85) × (0.70) = 0.595 (c) P(BSN | male) = 90% = 0.90 (d) P(BSN and male) = P(male) × P(BSN | male) = (0.15)(0.90) = 0.135 (e) Of the graduates, some are female and some are male. We can add the mutually exclusive probabilit ies. P(BSN) = [P(BSN | female) × P(female)] + [P(BSN | male) × P(male)] = [(0.70) × (0.85)] + [(0.90) × (0.15)] = 0.73 (f) The phrase “will graduate and is female” describes the proportion of all students who are female and will graduate. The phrase “will graduate, given female” describes the proportion of the females who will graduate. Observe from parts (a) and (b) that the probabilities are indeed different. 27. Let TB denote that the person has tuberculosis. Let + denote the test for tuberculosis indicates the presence of the disease. Let – denote the test for tuberculosis indicates the absence of the disease. We are given the following probabilit ies: P(+ | TB) = 0.82 (sensitivity of the test) P(+ | TB c ) = 0.09 (false-positive rate) P(TB) = 0.04 (a) P(TB and +) = P(+ | TB) × P(TB) = (0.82) × (0.04) = 0.0328 (b) P(TB c ) = 1 – P(TB) = 1 – 0.04 = 0.96 (c) P(TB c and +) = P(+ | TB c ) × P(TB c ) = (0.09) × (0.96) = 0.0864 308 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 28. Known: Let A be the event the client relapses in phase I. Let B be the event the client relapses in phase II. Let C be the event that the client has no relapse in phase I; i.e., C = not A. Let D be the event that the client has no relapse in phase II; i.e., D = not B. P(A) = 0.27, so P(A c ) = P(C) = 1 – 0.27 = 0.73 P(B) = 0.23, so P(B c ) = P(D) = 1 – 0.23 = 0.77 P(B c | A c ) = 0.95 = P(D | C) = 0.95 P(B | A) = 0.70 Possible Outcomes Translation A, B Relapse in I, relapse in II A c , B (= C, B) No relapse in I, relapse in II A, B c (= A, D) Relapse in I, no relapse in II A c , B c (= C, D) No relapse in I no relapse in II (a) P(A) = 0.27, P(B) = 0.23, P(C) = 0.73, P(D) = 0.77 (b) P(B | A) = 0.70, P(D | C) = 0.95 (c) P(A and B) = P(A) × P(B | A) = (0.27) × (0.70) = 0.189 P(C and D) = P(C) × P(D | C) = (0.73) × (0.95) = 0.6935 (d) P(A or B) = P(A) + P(B) – P(A and B) = 0.27 + 0.23 – 0.189 = 0.311 (e) P(C and D) = 0.69 (f) P(A and B) = 0.189 (g) Translate as the inclusive or. P(A or B) = 0.31. Section 4.3 1. The permutations rule counts the number of different arrangements, or r items out of n distinct items. Here, the ordering matters. The combinations rule counts the number of groups of r items out of n distinct items. Here, the ordering does not matter. For a permutation, ABC is different from ACB. For a combination, ABC and ACB are the same item. The number of permutations is larger than the number of combinations. 2. A tree diagram lists all possible events. The user of the diagram can trace the sequential event from the start to the end by following a distinct path along the branches. Counting the number of final branches gives the total number of outcomes. 3. (a) Use the combinations rule because we are concerned only with the groups of size five. (b) Use the permutations rules because we are concerned with the number of different arrangements of size five. 4. Both methods are correct because you are counting the number of possible arrangements of five items taken five at a time. Part IV: Complete Solutions, Chapter 4 309 Copyright © Houghton Mifflin Company. All right s reserved. 5. (a) (b) HHT, HTH, THH. There are three outcomes. (c) There are eight possible outcomes, and three outcomes have exactly t wo heads. 3 8 . 310 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 6. (a) (b) H5, H6. There are two outcomes. (c) There are 12 possible outcomes, and 2 outcomes meet the requirements. 2 1 12 6 = . Part IV: Complete Solutions, Chapter 4 311 Copyright © Houghton Mifflin Company. All right s reserved. 7. (a) (b) Let P(x, y) be the probability of choosing an x-colored ball on the first draw and a y-colored ball on the second draw. Notice that the probabilities add to 1. 2 1 2 1 ( , ) 6 5 30 15 2 3 6 1 ( , ) 6 5 30 5 2 1 2 1 ( , ) 6 5 30 15 P R R P R B P R Y    = = =          = = =          = = =       3 2 6 1 ( , ) 6 5 30 5 3 2 6 1 ( , ) 6 5 30 5 3 1 3 1 ( , ) 6 5 30 10 1 2 2 1 ( , ) 6 5 30 15 1 3 3 1 ( , ) 6 5 30 10 P B R P B B P B Y P Y R P Y B    = = =          = = =          = = =          = = =          = = =       312 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 8. (a) For clarity, only a partial tree diagram is provided. Each of the branches for B, C, and D would continue in the same manner as the fully expanded A branch. (b) If the outcomes are equally likely, then P(all 3 correct) 1 1 1 1 4 4 4 64     = =         . 9. Using the provided hint, we multiply. There are 4 × 3 × 2 × 1 = 4! = 24 possible wiring configurations. 10. Using the multiplication rule, we mult iply. There are 4! = 24 possible ways to visit the four cities. This problem is exactly like Problem 9. 11. There are four fert ilizers, three temperature zones for each fertilizer, and three water treat ments for every fertilizer–temperature zone combination. She needs to test 4 × 3 × 3 = 36 plots. 12. (a) The die rolls are independent, so mult iply the six outcomes for the first die and the six outcomes for the second die. There are 6 × 6 = 36 possible outcomes. (b) There are three possible even outcomes per die. There are 3 × 3 = 9 outcomes. Part IV: Complete Solutions, Chapter 4 313 Copyright © Houghton Mifflin Company. All right s reserved. (c) P(even, even) 9 1 0.25 36 4 = = = Using P(event) number of favorable outcomes total number of outcomes = Problems 13, 14, 15, and 16 deal wi th permutations. Use , ! ( )! n r n P n r = − to count the number of ways r objects can be selected from n objects when ordering matters. 13. 5, 2 : 5, 2 P n r = = 5, 2 5! 5 4 3 2 1 20 (5 2)! 3! P ⋅ ⋅ ⋅ ⋅ = = = − 14. 8, 3 : 8, 3 P n r = = 8, 3 8! 8 7 6 5 4 3 2 1 8 7 6 5! 336 (8 3)! 5! 5! P ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = = = = − 15. 7, 7 : 7 P n r = = 7, 7 7! 7! 7! 5, 040 (recall 0! 1) (7 7)! 0! P = = = = = − In general, , ! ! ! ! ( )! 0! 1 n n n n n P n n n = = = = − . 16. 9, 9 : 9 P n r = = 9, 9 9! 9! 9! 362, 880 (9 9)! 0! 1 P = = = = − Problems 17, 18, 19, and 20 deal wi th combi nations. Use , ! !( )! n r n C r n r = − to count the number of ways r objects can be selected from n objects when ordering is irrelevant. 17. 5, 2 : 5, 2 C n r = = 5, 2 5! 5! 5 4 3 2 1 20 10 2!(5 2)! 2!3! 2 1 3 2 1 2 C ⋅ ⋅ ⋅ ⋅ = = = = = − ⋅ ⋅ ⋅ ⋅ 18. 8, 3 : 8, 3 C n r = = 8, 3 8! 8! 8 7 6 5! 56 3!(8 3)! 3!5! 3 2 1 5! C ⋅ ⋅ ⋅ = = = = − ⋅ ⋅ ⋅ 314 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 19. 7, 7 : 7 C n r = = 7, 7 7! 7! 7! 1 (recall 0! 1) 7!(7 7)! 7!0! 7!(1) C = = = = = − In general, , ! ! ! 1. !( )! !0! !(1) n n n n n C n n n n n = = = = − There is only one way to choose n objects without regard to order. 20. 8, 8 : 8 C n r = = 8, 8 8! 8! 8! 1 (recall 0! 1) 8!(8 8)! 8!0! 8!(1) C = = = = = − 21. Since the order matters (first is day supervisor, second is night supervisor, and third is coordinator), this is a permutation of 15 nurse candidates to fill three positions. 15, 3 15! 15! 15 14 13 12! 2, 730 (15 3)! 12! 12! P ⋅ ⋅ ⋅ = = = = − 22. Order matters here because the order of the finalists selected determines the prize awarded. 10, 3 10! 10! 10 9 8 7! 720 (10 3)! 7! 7! P ⋅ ⋅ ⋅ = = = = − 23. Order matters because the resulting sequence determines who wins first, second, and third place. 5, 3 5! 5! 120 60 (5 3)! 2! 2 P = = = = − 24. The order of the software packages selected is irrelevant, so use the combinations method. 10, 3 10! 10! 10 9 8 7! 720 120 3!(10 3)! 3!7! 3!7! 6 C ⋅ ⋅ ⋅ = = = = = − 25. The order of trainee selection is irrelevant, so use the combinations method. 15, 5 15! 15! 15 14 13 12 11 10! 15 14 13 12 11 3, 003 5!(15 5)! 5!10! 5!10! 5 4 3 2 1 C ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = = = = = − ⋅ ⋅ ⋅ ⋅ 26. The order of the problems selected is irrelevant, so use the combinations method. (a) 12, 5 12! 12! 12 11 10 9 8 7! 12 11 10 9 8 792 5!(12 5)! 5!7! 5!7! 5 4 3 2 1 C ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = = = = = − ⋅ ⋅ ⋅ ⋅ (b) Jerry must have completed the same five problems as the professor selected to grade. P(Jerry chose the right problems) 1 0.001 792 = ≈ (c) Silvia did seven problems, so she completed 7, 5 7! 7! 7 6 5! 7 6 42 21 5!(7 5)! 5!2! 5!2! 2 1 2 C ⋅ ⋅ ⋅ = = = = = = − ⋅ possible subsets. Part IV: Complete Solutions, Chapter 4 315 Copyright © Houghton Mifflin Company. All right s reserved. P(Silvia picked the correct set of graded problems) 21 0.027 792 = ≈ Silvia increased her chances by a factor of 21 compared with Jerry. 27. (a) Six applicants are selected from among 12 without regard to order. 12, 6 2 12! 479, 001, 600 924. 6!6! (720) C = = = (b) This problem is asking, “In how many ways can six women be selected from seven applicants?” 7,6 7! 7 6! 1! C = = × (c) P(event A) number of favorable outcomes total number of outcomes = P(all hired are women) 7 1 0.008 924 132 = = ≈ Chapter 4 Review 1. (a) The individual does not own a cell phone. (b) The individual owns both a cell phone and a laptop computer. (c ) The individual owns either a cell phone or a laptop computer or both. (d) A laptop owner who owns a cell phone. (e) A cell phone owner who owns a laptop. 2. (a) Only if events A and B are mutually exclusive. Then P(A and B) = 0 and P(A or B) = P(A) + P(B). (b) Yes, see above. 3. (a) No, unless events A and B are independent. If they are not, we need either P(A | B) or P(B | A) to compute P(A and B). (b) Yes, now we can compute P(A and B) = P(A) × P(B). 4. The informat ion yields P(B | A) = 2. Probabilities must be between 0 and 1 inclusive. Also, P(A and B) cannot be greater than P(A) or P(B) individually. 5. P(asked) = 24% = 0.24 P(received | asked) = 45% = 0.45 P(asked and received) = P(asked) × P(received | asked) = (0.24) × (0.45) = 0.108 = 10.8% 6. P(asked) = 20% = 0.20 P(received | asked) = 59% = 0.59 P(asked and received) = P(asked) × P(received | asked) = (0.20) × (0.59) = 0.118 = 11.8% 7. (a) Throw a large number of similar thumbtacks or one thumbtack a large number of t imes, and record the relative frequency of the outcomes. Assume that the thumbtack falls either flat side down or t ilted. To estimate the probability the tack lands on its flat side, find the relat ive frequency of this occurrence, dividing the number of times this occurred by the total number of thumbtack tosses. 316 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. (b) The sample space consists of two outcomes: flat side down and tilted. (c) P(flat side down) 340 0.68 500 = = P(tilted) 1 0.68 0.32 = − = 8. (a) 470 ( ) 0.470 1000 390 ( ) 0.390 1000 140 ( ) 0.140 1000 P N P M P S = = = = = = (b) P(N | W) 420 0.840 500 = = P(S | W) 20 0.040 500 = = (c) P(N | A) 50 0.100 500 = = P(S | A) 120 0.240 500 = = (d) P(N and W) = P(W) × P(N | W) = (0.50) × (0.84) = 0.42 P(M and W) = P(W) × P(M | W) = (0.50) – (0.12) = 0.06 (e) ( or ) ( ) ( ) if mutually exclusive 470 390 860 0.860 1, 000 1, 000 1, 000 P N M P N P M = +     = + = =         No reaction is mutually exclusive from a mild react ion; they cannot occur at the same time. (f) If N and W were independent, P(N and W) = P(N) · P(W) = (0.470) × (0.500) = 0.235. However, from (d), we have P(N and W) = 0.420. They are not independent. 9. (a) Possible values for x are 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. (b) Below, the values for x are listed, along with the combinations required. 2 1 and 1 1 way 3 1 and 2, or 2 and 1 2 ways 4 1 and 3, 2 and 2, 3 and 1 3 ways 5 1 and 4, 2 and 3, 3 and 2, 4 and 1 4 ways 6 1 and 5, 2 and 4, 3 and 3, 4 and 2, 5 and 1 5 ways 7 1 and 6, 2 and 5, 3 and 4, 4 and 3, 5 and 2, 6 and 1 6 ways 8 2 and 6, 3 and 5, 4 and 4, 5 and 3, 6 and 2 5 ways 9 3 and 6, 4 and 5, 5 and 4, 6 and 3 4 ways 10 4 and 6, 5 and 5, 6 and 4 3 ways 11 5 and 6, 6 and 5 2 ways 12 6 and 6 1 way Part IV: Complete Solutions, Chapter 4 317 Copyright © Houghton Mifflin Company. All right s reserved. x P(x) Where there are (6)(6) = 36 possible, equally likely outcomes. (The sums, however, are not equally likely). 2 1 0.028 36 ≈ 3 2 0.056 36 ≈ 4 3 0.083 36 ≈ 5 4 0.111 36 ≈ 6 5 0.139 36 ≈ 7 6 0.167 36 ≈ 8 5 0.139 36 ≈ 9 4 0.111 36 ≈ 10 3 0.083 36 ≈ 11 2 0.056 36 ≈ 12 1 0.028 36 ≈ 10. P(pass 101) = 0.77 P(pass 102 | pass 101) = 0.90 P(pass 101 and pass 102) = P(pass 101) × P(pass 102 | pass 101) = (0.77) × (0.90) = 0.693 11. 8, 2 8! 8 7 6! 56 28 2!6! (2 1)6! 2 C ⋅ ⋅ = = = = ⋅ 12. (a) 7, 2 7! 7! 7(6) 42 (7 2)! 5! P = = = = − (b) 7, 2 7! 7 6 21 2!5! 2 C ⋅ = = = (c) 3, 3 3! 3! 6 (3 3)! 0! P = = = − (d) 4, 4 4! 4! 1 4!(4 4)! 4!0! C = = = − 13. Five multiple choice questions, each with flurossible (A, B, C, or D). There are 4 × 4 × 4 × 4 × 4 = 1,024 possible sequences, such as A, D, B, B. P(getting the correct sequence) 1 0.00098 1024 = ≈ 318 Part IV: Complete Solutions, Chapter 4 Copyright © Houghton Mifflin Company. All right s reserved. 14. 15. There are 10 possible numbers per turn of dial and, we turn the dial three times. There are 10 × 10 × 10 = 1,000 possible combinations. 16. The combination uses the three numbers 2, 9, and 5, in an ordered sequence. The number of sequences is 3, 3 3! 3 2 1 6. (3 3)! 0! P ⋅ ⋅ = = = − The possible combinations are 259, 295, 529, 592, 925, and 952. Part IV: Complete Solutions, Chapter 5 327 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 5: The Binomial Probability Distribution and Related Topics Section 5.1 1. (a) The number of traffic fatalities can be only a whole number. This is a discrete random variable. (b) Distance can assume any value, so this is a continuous random variable. (c) Time can take on any value, so this is a continuous random variable. (d) The number of ships can be only a whole number. This is a discrete random variable. (e) Weight can assume any value, so this is a continuous random variable. 2. (a) Speed can assume any value, so this is a continuous random variable. (b) Age can take on any value, so this is a continuous random variable. (c) Number of books can be only a whole number, so this is a discrete random variable. (d) Weight can assume any value, so this is a continuous random variable. (e) Number of lightning strikes can be only a whole number, so this is a discrete random variable. 3. (a) ( ) 0.25 0.60 0.15 1.00 P x ∑ = + + = Yes, this is a valid probability distribution because the sum of the probabilities is 1, each probability is between 0 and 1 inclusive, and each event is assigned a probability. (b) ( ) 0.25 0.60 0.20 1.05 P x ∑ = + + = No, this is not a probability distribution because the probabilities sum to more than 1. 4. No, the expected value of a random variable x can be a value different from the exact values of x. For example, if we have the following random variable, the expected value is μ = (0 × 0.5) + (1 × 0.5) = 0.50. 5. (a) Yes, seven of the ten digits are assigned to “make a basket.” (b) Let S represent “make a basket” and F represent “miss.” We have F F S S S F F F S S (c) Yes, again, seven of the ten digits represent “make a basket.” We have S S S S S S S S S S 6. (a) ( ) 0.07 0.44 0.24 0.14 0.11 1.00 P x ∑ = + + + + = Yes, this is a valid probability distribution because the events are distinct and the probabilities sum to 1. x 0 1 P(x) 0.5 0.5 328 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (b) P e r c e n t 67 56 45 34 23 50 40 30 20 10 0 11 14 24 44 7 Age of Pr omot i on Sensi t i ve Shopper s (c) ( ) ( ) ( ) ( ) ( ) ( ) 23 0.07 34 0.44 45 0.24 56 0.14 67 0.11 42.58 xP x µ = ∑ = + + + + = (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 19.58 0.07 8.58 0.44 2.42 0.24 13.42 0.14 24.42 0.11 151.44 12.31 x P x σ µ = ∑ − = − + − + + + = ≈ 7. (a) ( ) 0.21 0.14 0.22 0.15 0.20 0.08 1.00 P x ∑ = + + + + + = Yes, this is a valid probability distribution because the events are distinct and the probabilities sum to 1. (b) P e r c e n t 60 50 40 30 20 10 25 20 15 10 5 0 8 20 15 22 14 21 Hi st ogr am of I ncome Di st r i but i on Part IV: Complete Solutions, Chapter 5 329 Copyright © Houghton Mifflin Company. All rights reserved. (c) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 10 0.21 20 0.14 30 0.22 40 0.15 50 0.20 60 0.08 32.3 xP x µ = ∑ = + + + + + = (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 2 22.3 0.21 12.3 0.14 2.3 0.22 7.7 0.15 17.7 0.20 27.7 0.08 259.71 16.12 x P x σ µ − + − + − + + + = ∑ − = = ≈ 8. (a) ( ) 0.057 0.097 0.195 0.292 0.250 0.091 0.018 1.000 P x ∑ = + + + + + + = Yes, this is a valid probability distribution because the outcomes are distinct and the probabilities sum to 1. (b) P e r c e n t 84.5 74.5 64.5 54.5 44.5 34.5 24.5 30 25 20 15 10 5 0 1.8 9.1 25 29.2 19.5 9.7 5.7 Hi st ogr am of Br i t i sh Nur se Ages (c) ( ) ( ) ( ) ( ) 60 years of age or older 64.5 74.5 84.5 0.250 0.091 0.018 0.359 P P P P = + + = + + = The probability is 35.9%. (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 24.5 0.057 34.5 0.097 44.5 0.195 54.5 0.292 64.5 0.250 74.5 0.091 84.5 0.018 53.76 xP x µ = ∑ = + + + + + + = 330 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (e) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 2 2 29.26 0.057 19.26 0.097 9.26 0.195 0.74 0.292 10.74 0.250 20.74 0.091 186.65 13.66 30.74 0.018 x P x σ µ = ∑ − − + − + − + + = + = ≈ + 9. (a) P e r c e n t 4 3 2 1 0 50 40 30 20 10 0 1 4 15 36 44 Hi st ogr am of Number of Tr out Caught (b) ( ) ( ) 1 or more 1 0 1 0.44 0.56 P P = − = − = (c) ( ) ( ) ( ) ( ) 2 or more 2 3 4 or more 0.15 0.04 0.01 0.20 P P P P = + + = + + = (d) ( ) ( ) ( ) ( ) ( ) ( ) 0 0.44 1 0.36 2 0.15 3 0.04 4 0.01 0.82 xP x µ = ∑ = + + + + = (e) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 0.82 0.44 0.18 0.36 1.18 0.15 2.18 0.04 3.18 0.01 0.8076 0.899 x P x σ µ = ∑ − = − + + + + = ≈ 10. ( ) 1.000 P x ∑ ≠ owing to rounding. Part IV: Complete Solutions, Chapter 5 331 Copyright © Houghton Mifflin Company. All rights reserved. (a) ( ) ( ) 1 or more 1 0 1 0.237 0.763 P P = − = − = (b) ( ) ( ) ( ) ( ) ( ) 2 or more 2 3 4 5 0.264 0.088 0.015 0.001 0.368 P P P P P = + + + = + + + = (c) ( ) ( ) ( ) 4 or more 4 5 0.015 0.001 0.016 P P P = + = + = (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 0.237 1 0.396 2 0.264 3 0.088 4 0.015 5 0.001 1.253 xP x µ = ∑ = + + + + + = (e) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 2 1.253 0.237 0.253 0.396 0.747 0.264 1.747 0.088 2.747 0.015 3.747 0.001 0.941 0.97 x P x σ µ = ∑ − − + − + + = + + = ≈ 11. (a) ( ) ( ) 15 win 0.021 719 719 15 704 not win 0.979 719 719 P P = ≈ − = = ≈ (b) ( ) ( ) Expected earnings = value of dinner probability of winning 15 $35 719 $0.73 | | = | \ . ≈ Lisa’s expected earnings are $0.73. Contribution $15 $0.73 $14.27 = − = Lisa effectively contributed $14.27 to the hiking club. 12. (a) ( ) ( ) 6 win 0.0021 2, 852 2, 852 6 2, 846 not win 0.9979 2, 852 2, 852 P P = ≈ − = = ≈ (b) ( )( ) ( ) Expected earnings = value of cruise probability of winning $2, 000 0.0021 $4.20 ≈ ≈ Kevin spent 6($5) = $30 for the tickets. His expected earnings are less than the amount he paid. Contribution $30 $4.20 $25.80 = − = Kevin effectively contributed $25.80 to the homeless center. 332 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 13. (a) ( ) ( ) 60 years 0.01191 Expected loss $50,000 0.01191 $595.50 P = = = The expected loss for Big Rock Insurance is $595.50. (b) Probability Expected Loss ( ) 61 0.01292 P = ( ) $50, 000 0.01292 $646 = ( ) 62 0.01396 P = ( ) $50, 000 0.01396 $698 = ( ) 63 0.01503 P = ( ) $50, 000 0.01503 $751.50 = ( ) 64 0.01613 P = ( ) $50, 000 0.01613 $806.50 = Expected loss $595.50 $646 $698 $751.50 $806.50 $3, 497.50 = + + + + = The total expected loss is $3,497.50. (c) $3,497.50 + $700 = $4,197.50 They should charge $4,197.50. (d) $5, 000 $3, 497.50 $1, 502.50 They can expect to make $1,502.50. − = 14. (a) ( ) ( ) 60 years 0.00756 Expected loss $50,000 0.00756 $378 P = = = The expected loss for Big Rock Insurance is $378. (b) Probability Expected Loss ( ) 61 0.00825 P = ( ) $50, 000 0.00825 $412.50 = ( ) 62 0.00896 P = ( ) $50, 000 0.00896 $448 = ( ) 63 0.00965 P = ( ) $50, 000 0.00965 $482.50 = ( ) 64 0.01035 P = ( ) $50, 000 0.01035 $517.50 = Expected loss $378 $412.50 $448 $482.50 $517.50 $2, 238.50 = + + + + = The total expected loss is $2,238.50. (c) $2,238.50 + $700 = $2,938.50 They should charge $2,938.50. (d) $5, 000 $2, 238.50 $2, 761.50 They can expect to make $2,761.50. − = 15. (a) W = x 1 − x 2 ; a = 1, b = −1 ( ) 1 2 1 2 2 2 2 2 2 2 2 2 115 100 15 1 1 12 8 208 208 14.4 W W W W µ µ µ σ σ σ σ σ − = = − = = + − = + = = = ≈ Part IV: Complete Solutions, Chapter 5 333 Copyright © Houghton Mifflin Company. All rights reserved. (b) W = 0.5x 1 + 0.5x 2 ; a = 0.5, b = 0.5 ( ) ( ) ( ) ( ) ( ) ( ) 1 2 1 2 2 2 2 2 2 2 2 2 0.5 0.5 0.5 115 0.5 100 107.5 0.5 0.5 0.25 12 0.25 8 52 52 7.2 W W W W µ µ µ σ σ σ σ σ = + = + = = + = + = = = ≈ (c) L = 0.8x 1 − 2; a = −2, b = 0.8 ( ) ( ) ( ) 1 1 2 2 2 2 2 2 0.8 2 0.8 115 90 0.8 0.64 12 92.16 92.16 9.6 L L L L µ µ σ σ σ σ = − + = − + = = = = = = = (d) L = 0.95x 2 − 5; a = −5, b = 0.95 ( ) ( ) ( ) 2 2 2 2 2 2 2 5 0.95 5 0.95 100 90 0.95 0.9025 8 57.76 57.76 7.6 L L L L µ µ σ σ σ σ = − + = − + = = = = = = = 16. (a) W = x 1 + x 2 ; a = 1, b = 1 ( ) ( ) 1 2 1 2 2 2 2 2 2 2 28.1 90.5 118.6 minutes 8.2 15.2 298.28 298.28 17.27 minutes W W W W µ µ µ σ σ σ σ σ + = = + = = + = + = = = ≈ (b) W = 1.50x 1 + 2.75x 2 ; a = 1.50, b = 2.75 ( ) ( ) ( ) ( ) ( ) ( ) 1 2 1 2 2 2 2 2 2 2 2 2 1.50 2.75 1.50 28.1 2.75 90.5 $291.03 1.50 2.75 2.25 8.2 7.5625 15.2 1, 898.53 1, 898.53 $43.57 W W W W µ µ µ σ σ σ σ σ = + = + ≈ = + = + = = = ≈ (c) L = 1.5x 1 + 50; a = 50, b = 1.5 ( ) ( ) ( ) 1 1 2 2 2 2 2 50 1.5 50 1.5 28.1 $92.15 1.5 2.25 8.2 151.29 151.29 $12.30 L L L L µ µ σ σ σ σ = + = + = = = = = = = 17. (a) W = 0.5x 1 + 0.5x 2 ; a = 0.5, b = 0.5 ( ) ( ) ( ) ( ) 1 2 1 2 2 2 2 2 2 2 2 2 2 2 0.5 0.5 0.5 50.2 0.5 50.2 50.2 0.5 0.5 0.5 11.5 0.5 11.5 66.125 66.125 8.13 W W W W µ µ µ σ σ σ σ σ = + = + = = + = + = = = ≈ (b) ( ) ( ) 1 1 Single policy : 50.2 Two policies : 50.2 W x W µ µ = ≈ The means are the same. (c) ( ) ( ) 1 1 Single policy : 11.5 Two policies : 8.13 W x W σ σ = ≈ 334 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. The standard deviation for the average of two policies is smaller. (d) Yes, the risk decreases by a factor of 1 n because 1 . W n σ σ = Section 5.2 1. The random variable counts the number of successes that occur in the n trials. 2. The outcome of one trial will not affect the probability of success of any other trial. 3. Binomial experiments have two possible outcomes, denoted success and failure. 4. In a binomial experiment, the probability of success does not change for each trial. 5. (a) No, there must be only two outcomes for each trial. Here, there are three outcomes. (b) Yes. If we combined outcomes B and C into a single outcome, then we have a binomial experiment. The probability of success for each trial is P(A) = p = 0.40. 6. Yes, the five trials are independent, repeated under the same conditions, and have the same two outcomes (win or not win) and the same probability of success on each trial. Here, n = 5, p = 1/6, and r = 2 7. (a) A trial is the random selection of one student and noting whether the student is a freshman or is not a freshman. Here, the probability of success is p = 0.40 and the probability of a failure is 1 – 0.40 = 0.60 (b) For a small population of size 30, sampling without replacement will alter the probability of drawing a freshman. In this situation, the hypergeometric distribution is appropriate. 8. (a) Yes, 90% of the digits are assigned to “successful surgery”. (b) S F S S S S S S S S S S S S F (c) Yes, the assignment is fine. This simulation produces all successes. 9. A trial is one flip of a fair quarter. Success = head. Failure = tail. 3, 0.5, 1 0.5 0.5 n p q = = = − = (a) ( ) ( ) ( ) ( ) ( ) 3 3 3 3,3 3 0 3 0.5 0.5 1 0.5 0.5 0.125 P C − = = = To find this value in Table 3 of Appendix II, use the group in which n = 3, the column headed by p = 0.5 and the row headed by r = 3. (b) ( ) ( ) ( ) ( ) ( ) 2 3 2 3,2 2 1 2 0.5 0.5 3 0.5 0.5 0.375 P C − = = = To find this value in Table 3 of Appendix II, use the group in which n = 3, the column headed by p = 0.5 and the row headed by r = 2. (c) ( ) ( ) ( ) 2 2 3 0.125 0.375 0.5 P r P P ≥ = + = + = Part IV: Complete Solutions, Chapter 5 335 Copyright © Houghton Mifflin Company. All rights reserved. (d) The probability of getting exactly three tails is the same as getting exactly zero heads. ( ) ( ) ( ) ( ) ( ) 0 3 0 3,0 0 3 0 0.5 0.5 1 0.5 0.5 0.125 P C − = = = To find this value in Table 3 of Appendix II, use the group in which n = 3, the column headed by p = 0.5 and the row headed by r = 0. 10. A trial is answering a question on the quiz. Success = correct answer. Failure = incorrect answer. 1 5 10, 0.2, 1 0.2 0.8 n p q = = = = − = (a) ( ) ( ) ( ) ( ) ( ) 10 10 10 10,10 10 0 10 0.2 0.8 1 0.2 0.8 0.000 (to three digits) P C − = = = (b) Answering 10 incorrectly is the same as answering 0 correctly. ( ) ( ) ( ) ( ) ( ) 0 10 0 10,0 0 10 0 0.2 0.8 1 0.2 0.8 0.107 P C − = = = (c) First method: ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 2 3 4 5 6 7 8 9 10 0.268 0.302 0.201 0.088 0.026 0.006 0.001 0.000 0.000 0.000 0.892 P r P P P P P P P P P P ≥ = + + + + + + + + + = + + + + + + + + + = Second method: ( ) ( ) 1 1 0 1 0.107 0.893 P r P ≥ = − = − = The two results should be equal, but because of rounding error, they differ slightly. (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 5 5 6 7 8 9 10 0.026 0.006 0.001 0.000 0.000 0.000 0.033 P r P P P P P P ≥ = + + + + + = + + + + + = 11. A trial consists of determining the sex of a wolf. Success = male. Failure = female. 336 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (a) 12, 0.55, 0.45 n p q = = = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 6 7 8 9 10 11 12 0.212 0.223 0.170 0.092 0.034 0.008 0.001 0.740 P r P P P P P P P ≥ = + + + + + + = + + + + + + = Six or more females is the same as six or fewer males. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 0 1 2 3 4 5 6 0.000 0.001 0.007 0.028 0.076 0.149 0.212 0.473 P r P P P P P P P ≤ = + + + + + + = + + + + + + = Fewer than four females is the same as more than eight males. ( ) ( ) ( ) ( ) ( ) 8 9 10 11 12 0.092 0.034 0.008 0.001 0.135 P r P P P P > = + + + = + + + = (b) 12, 0.70, 0.30 n p q = = = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 6 7 8 9 10 11 12 0.079 0.158 0.231 0.240 0.168 0.071 0.014 0.961 P r P P P P P P P ≥ = + + + + + + = + + + + + + = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 0 1 2 3 4 5 6 0.000 0.000 0.000 0.001 0.008 0.029 0.079 0.117 P r P P P P P P P ≤ = + + + + + + = + + + + + + = ( ) ( ) ( ) ( ) ( ) 8 9 10 11 12 0.240 0.168 0.071 0.014 0.493 P r P P P P > = + + + = + + + = 12. A trial is a one-time fling. Success = has done a one-time fling. Failure = has not done a one-time fling. 7, 0.10, 1 0.10 0.90 n p q = = = − = (a) ( ) ( ) ( ) ( ) ( ) 0 7 0 7,0 0 7 0 0.10 0.90 1 0.10 0.90 0.478 P C − = = = (b) ( ) ( ) 1 1 0 1 0.478 0.522 P r P ≥ = − = − = (c) ( ) ( ) ( ) ( ) 2 0 1 2 0.478 0.372 0.124 0.974 P r P P P ≤ = + + = + + = 13. A trial consists of a woman’s response regarding her mother-in-law. Success = dislike. Failure = like. 6, 0.90, 1 0.90 0.10 n p q = = = − = (a) ( ) ( ) ( ) ( ) ( ) 6 6 6 6,6 6 0 6 0.90 0.10 1 0.90 0.10 0.531 P C − = = = Part IV: Complete Solutions, Chapter 5 337 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( ) ( ) ( ) ( ) ( ) 0 6 0 6,0 0 6 0 0.90 0.10 1 0.90 0.10 0.000 (to 3 digits) P C − = = ≈ (c) ( ) ( ) ( ) ( ) 4 4 5 6 0.098 0.354 0.531 0.983 P r P P P ≥ = + + = + + = (d) ( ) ( ) 3 1 4 1 0.983 0.017 P r P r ≤ = − ≥ ≈ − = From the table: ( ) ( ) ( ) ( ) ( ) 3 0 1 2 3 0.000 0.000 0.001 0.015 0.016 P r P P P P ≤ = + + + = + + + = 14. A trial is how a businessman wears a tie. Success = too tight. Failure = not too tight. 20, 0.10, 1 0.10 0.90 n p q = = = − = (a) ( ) ( ) 1 1 0 1 0.122 0.878 P r P r ≥ = − = = − = (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 1 2 1 0 1 2 1 0 1 2 1 0 1 2 1 1 2 0.878 0.270 0.285 using (a) 0.323 P r P r P r P r P r P r P r P r P r P r P r P r P r P r > = − ≤ = − = + = + = ( ¸ ¸ = − = − = − = = − = − = − = ( ¸ ¸ = ≥ − = − = = − − = (c) ( ) 0 0.122 P r = = (d) “At least 18 are not too tight” is the same as “at most 2 are too tight.” (To see this, note that “at least 18 failures” is the same as “18 or 19 or 20 failures,” which is 2, 1, or 0 successes, i.e., at most 2 successes.) ( ) ( ) 2 1 2 1 0.323 using (b) 0.677 P r P r ≤ = − > = − = 15. A trial consists of taking a polygraph examination. Success = pass. Failure = fail. 9, 0.85, 1 0.85 0.15 n p q = = = − = (a) ( ) 9 0.232 P = (b) ( ) ( ) ( ) ( ) ( ) ( ) 5 5 6 7 8 9 0.028 0.107 0.260 0.368 0.232 0.995 P r P P P P P ≥ = + + + + = + + + + = 338 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (c) ( ) ( ) 4 1 5 1 0.995 0.005 P r P r ≤ = − ≥ = − = From the table: ( ) ( ) ( ) ( ) ( ) ( ) 4 0 1 2 3 4 0.000 0.000 0.000 0.001 0.005 0.006 P r P P P P P ≤ = + + + + = + + + + = The two results should be equal, but because of rounding error, they differ slightly. (d) All students failing is the same as no students passing. ( ) 0 0.000 (to 3 digits) P = 16. A trial consists of checking the gross receipts of the store for one business day. Success = gross over $850. Failure = gross is at or below $850. p = 0.6, q = 1 − p = 0.4. (a) 5 n = ( ) ( ) ( ) ( ) 3 3 4 5 0.346 0.259 0.078 0.683 P r P P P ≥ = + + = + + = (b) 10 n = ( ) ( ) ( ) ( ) ( ) ( ) 6 6 7 8 9 10 0.251 0.215 0.121 0.040 0.006 0.633 P r P P P P P ≥ = + + + + = + + + + = (c) 10 n = ( ) ( ) ( ) ( ) ( ) ( ) 5 0 1 2 3 4 0.000 0.002 0.011 0.042 0.111 0.166 P r P P P P P < = + + + + = + + + + = (d) 20 n = ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 0 1 2 3 4 5 0.000 0.000 0.000 0.000 0.000 0.001 0.001 P r P P P P P P < = + + + + + = + + + + + = Yes. If p were really 0.60, then the event of a 20-day period with gross income exceeding $850 fewer than 6 days would be very rare. If it happened again, we would suspect that p = 0.60 is too high. (e) 20 n = ( ) ( ) ( ) ( ) 17 18 19 20 0.003 0.000 0.000 0.003 P r P P P > = + + = + + = Yes. If p were really 0.60, then the event of a 20-day period with gross income exceeding $850 more than 17 days would be very rare. If it happened again, we would suspect that p = 0.60 is too low. Part IV: Complete Solutions, Chapter 5 339 Copyright © Houghton Mifflin Company. All rights reserved. 17. (a) A trial consists of using the Meyers-Briggs instrument to determine if a person in marketing is an extrovert. Success = extrovert. Failure = not extrovert. 15, 0.75, 1 0.75 0.25 n p q = = = − = ( ) ( ) ( ) ( ) ( ) ( ) ( ) 10 10 11 12 13 14 15 0.165 0.225 0.225 0.156 0.067 0.013 0.851 P r P P P P P P ≥ = + + + + + = + + + + + = ( ) ( ) ( ) ( ) ( ) ( ) ( ) 5 5 6 7 8 9 10 0.001 0.003 0.013 0.039 0.092 0.851 0.999 P r P P P P P P r ≥ = + + + + + ≥ = + + + + + = ( ) 15 0.013 P = (b) A trial consists of using the Meyers-Briggs instrument to determine if a computer programmer is an introvert. Success = introvert. Failure = not introvert. 5, 0.60, 1 0.60 0.40 n p q = = = − = ( ) 0 0.010 P = ( ) ( ) ( ) ( ) 3 3 4 5 0.346 0.259 0.078 0.683 P r P P P ≥ = + + = + + = 18. A trial consists of the response from adults regarding their concern that employers are monitoring phone calls. Success = yes. Failure = no. 0.37, 1 0.37 0.63 p q = = − = (a) 5 n = ( ) ( ) ( ) ( ) ( ) 0 5 0 5,0 0 5 0 0.37 0.63 1 0.37 0.63 0.099 P C − = = ≈ (b) 5 n = ( ) ( ) ( ) ( ) ( ) 5 5 5 5,5 5 0 5 0.37 0.63 1 0.37 0.63 0.007 P C − = = ≈ (c) 5 n = ( ) ( ) ( ) ( ) ( ) 3 5 3 5,3 3 2 3 0.37 0.63 10 0.37 0.63 0.201 P C − = = ≈ 19. A trial consists of the response from adults regarding their concern that Social Security numbers are used for general identification. Success = concerned that SS numbers are being used for identification. Failure = not concerned that SS numbers are being used for identification. 8, 0.53, 1 0.53 0.47 n p q = = = − = (a) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 5 0 1 2 3 4 5 0.002381 0.021481 0.084781 0.191208 0.269521 0.243143 0.812515 P r P P P P P P ≤ = + + + + + = + + + + + = ( ) 5 0.81251 P r ≤ = from the cumulative probability is the same, truncated to 5 digits. 340 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( ) ( ) ( ) ( ) 5 6 7 8 0.137091 0.044169 0.006726 0.187486 P r P P P > = + + = + + = ( ) ( ) 5 1 5 1 0.81251 0.18749 P r P r > = − ≤ = − = Yes, this is the same result rounded to 5 digits. Part IV: Complete Solutions, Chapter 5 341 Copyright © Houghton Mifflin Company. All rights reserved. 20. A trial consists of an office visit. (a) Success visitor age is under 15 years old. Failure visitor age is 15 years old or older. = = 8, 0.20, 1 0.20 0.80 n p q = = = − = ( ) ( ) ( ) ( ) ( ) ( ) 4 4 5 6 7 8 0.046 0.009 0.001 0.000 0.000 0.056 P r P P P P P ≥ = + + + + = + + + + = (b) Success visitor age is 65 years old or older. Failure visitor age is under 65 years old. = = 8, 0.25, 1 0.25 0.75 n p q = = = − = ( ) ( ) ( ) ( ) ( ) 2 5 2 3 4 5 0.311 0.208 0.087 0.023 0.629 P r P P P P ≤ ≤ = + + + = + + + = (c) Success visitor age is 45 years old or older. Failure visitor age is less than 65 years old. = = 8, 0.20 0.25 0.45, 1 0.45 0.55 n p q = = + = = − = ( ) ( ) ( ) ( ) ( ) 2 5 2 3 4 5 0.157 0.257 0.263 0.172 0.849 P r P P P P ≤ ≤ = + + + = + + + = (d) Success visitor age is under 25 years old. Failure visitor age is 25 years old or older. = = 8, 0.20 0.10 0.30, 1 0.30 0.70 n p q = = + = = − = ( ) 8 0.000 (to 3 digits) P = (e) Success visitor age is 15 years old or older. Failure visitor age is under 15 years old. = = 8, 0.10 0.25 0.20 0.25 0.80, 0.20 n p q = = + + + = = ( ) 8 0.168 P = 21. (a) ( ) ( ) 0.30, 3 0.132 0.70, 2 0.132 p P p P = = = = They are the same. (b) ( ) ( ) 0.30, 3 0.132 0.028 0.002 0.162 0.70, 2 0.002 0.028 0.132 0.162 p P r p P r = ≥ = + + = = ≤ = + + = They are the same. (c) ( ) ( ) 0.30, 4 0.028 0.70, 1 0.028 1 p P p P r = = = = = (d) The column headed by p = 0.80 is symmetric with the one headed by p = 0.20. 22. 3, 0.0228 n p = = , q = 1 − p = 0.9772 (a) ( ) ( ) ( ) 2 1 2 3 2 3,2 2 3 0.0228 0.9772 0.00152 P C p q − = = = (b) ( ) ( ) ( ) 3 0 3 3 3 3,3 3 1 0.0228 0.9772 0.00001 P C p q − = = = (c) ( ) ( ) ( ) 2 or 3 2 3 0.00153 P P P = + = 342 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 23. (a) n = 8; p = 0.65 (6 and 4 ) (6 , 4 ) (4 ) (6 ) (4 ) (6) (7) (8) (4) (5) (6) (7) (8) 0.259 0.137 0.032 0.188 0.279 0.259 0.137 0.032 0.428 0.895 0.478 P r r P r given r P r P r P r P P P P P P P P ≤ ≤ ≤ ≤ = ≤ ≤ = ≤ + + = + + + + + + = + + + + = = (b) n = 10; p = 0.65 (8 and 6 ) (8 , 6 ) (6 ) (8 ) (6 ) (8) (9) (10) (6) (7) (8) (9) (10) 0.176 0.072 0.014 0.238 0.252 0.176 0.072 0.014 0.262 0.752 0.348 P r r P r given r P r P r P r P P P P P P P P ≤ ≤ ≤ ≤ = ≤ ≤ = ≤ + + = + + + + + + = + + + + = = (c) Answers vary. Possibilities include the stock market, getting raises at work, or passing exams. (d) Let event A = 6 ≤ r and event B = 4 ≤ r in the formula. 24. (a) n = 10; p = 0.70 (8 and 6 ) (8 , given 6 ) (6 ) (8 ) (6 ) (8) (9) (10) (6) (7) (8) (9) (10) 0.233 0.121 0.028 0.200 0.267 0.233 0.121 0.028 0.382 0.849 0.450 P r r P r r P r P r P r P P P P P P P P ≤ ≤ ≤ ≤ = ≤ ≤ = ≤ + + = + + + + + + = + + + + = = Part IV: Complete Solutions, Chapter 5 343 Copyright © Houghton Mifflin Company. All rights reserved. (b) n = 10; p = 0.70 ( 10 and 6 ) ( 10, 6 ) (6 ) ( 10) (6 ) 0.028 0.849 0.033 P r r P r given r P r P r P r = ≤ = ≤ = ≤ = = ≤ = = Section 5.3 1. The average number of successes in n trials. 2. The expected value of the first distribution will be higher. 3. (a) Yes, 120 is more than 2.5 standard deviations above the mean. (b) Yes, 40 is more than 2.5 standard deviations below the mean. (c) No, the entire interval is within 2.5 standard deviations above and below the mean. 4. (a) At p = 0.50, the distribution is symmetric. The expected value is r = 5. Yes, the distribution is centered at r = 5. (b) The distribution is skewed right. (c) The distribution is skewed left. 5. (a) The distribution is symmetric. P r o b a b i l i t y 5 4 3 2 1 0 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125 n = 5 , p = 0 . 5 0 344 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (b) The distribution is skewed right. P r o b a b i l i t y 5 4 3 2 1 0 0.4 0.3 0.2 0.1 0.0 0.000977 0.014648 0.087891 0.263672 0.395508 0.237305 n = 5 , p = 0 . 2 5 (c) The distribution is skewed left. P r o b a b i l i t y 5 4 3 2 1 0 0.4 0.3 0.2 0.1 0.0 0.237305 0.395508 0.263672 0.087891 0.014648 0.000977 n = 5 , p = 0 . 7 5 (d) The distributions are mirror images of one another. (e) The distribution would be skewed left for p = 0.73 because p > 0.50. 6. (a) p = 0.30 goes with graph II because it is slightly skewed right. (b) p = 0.50 goes with graph I because it is symmetric. (c) p = 0.65 goes with graph III because it is slightly skewed left. (d) p = 0.90 goes with graph IV because it is skewed left. (e) The graph is approximately symmetric when p is close to 0.5. The graph is skewed left when p is close to 1 and skewed right when p is close to 0. Part IV: Complete Solutions, Chapter 5 345 Copyright © Houghton Mifflin Company. All rights reserved. 7. Minitab was used to generate the distribution. (a) 10, 0.80 n p = = P r o b a b i l i t y 10 9 8 7 6 5 4 3 2 1 0 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 . 1 0 7 3 7 5 0 . 2 6 8 4 4 3 0 . 3 0 1 9 9 5 0 . 2 0 1 3 3 0 . 0 8 8 0 8 4 4 0 . 0 2 6 4 2 1 3 0 . 0 0 5 5 0 0 2 8 0 . 0 0 0 7 8 0 0 3 9 7 . 0 0 0 3 5 e - 0 0 5 n = 1 0 , p = 0 . 8 0 ( ) ( ) ( ) 10 0.8 8 10 0.8 0.2 1.26 np npq µ σ = = = = = ≈ (b) 10, 0.5 n p = = P r o b a b i l i t y 10 9 8 7 6 5 4 3 2 1 0 0.25 0.20 0.15 0.10 0.05 0.00 0 . 0 0 0 9 7 0 0 6 8 0 . 0 0 9 7 6 0 6 8 0 . 0 4 3 9 4 3 1 0 . 1 1 7 1 8 8 0 . 2 0 5 0 8 4 0 . 2 4 6 1 0 7 0 . 2 0 5 0 8 4 0 . 1 1 7 1 8 8 0 . 0 4 3 9 4 3 1 0 . 0 0 9 7 6 0 6 8 0 . 0 0 0 9 7 0 0 6 8 n = 1 0 , p = 0 . 5 0 346 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ( ) ( ) 10 0.5 5 10 0.5 0.5 1.58 np npq µ σ = = = = = ≈ (c) Yes; since the graph in part (a) is skewed left, it supports the claim that more households buy film that have children under 2 years of age than households that have no children under 21 years of age. 8. (a) 8, 0.01 n p = = P r o b a b i l i t y 8 7 6 5 4 3 2 1 0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 n = 8 , p = 0 . 0 1 (b) ( ) 8 0.01 0.08 np µ = = = The expected number of defective syringes the inspector will find is 0.08. (c) The batch will be accepted if fewer than two defectives are found. ( ) ( ) ( ) 2 0 1 0.923 0.075 0.998 P r P P < = + = + = (d) ( ) ( ) 8 0.01 0.99 0.281 npq σ = = ≈ 9. (a) 6, 0.70 n p = = P r o b a b i l i t y 6 5 4 3 2 1 0 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.113402 0.309278 0.329897 0.185567 0.0515464 0.0103093 n = 6 , p = 0 . 7 0 Part IV: Complete Solutions, Chapter 5 347 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( ) ( ) ( ) 6 0.70 4.2 6 0.70 0.30 1.122 np npq µ σ = = = = = ≈ We expect 4.2 friends to be found. (c) Find n such that ( ) 2 0.97. P r ≥ = Try n = 5. ( ) ( ) ( ) ( ) ( ) 2 2 3 4 5 0.132 0.309 0.360 0.168 0.969 0.97 P r P P P P ≥ = + + + = + + + = ≈ You would have to submit five names to be 97% sure that at least two addresses will be found. If you solve this problem as ( ) ( ) ( ) ( ) 2 1 2 1 0 1 1 0.002 0.028 0.97 P r P r P r P r ≥ = − < = − = + = ( ¸ ¸ = − − = the answers differ owing to rounding error in the table. 10. (a) 5, 0.85 n p = = P r o b a b i l i t y 5 4 3 2 1 0 0.5 0.4 0.3 0.2 0.1 0.0 0.443888 0.391784 0.138277 0.0240481 0.00200401 n = 5, p = 0.85 (b) ( ) ( ) ( ) 5 0.85 4.25 5 0.85 0.15 0.798 np npq µ σ = = = = = ≈ For samples of size 5, the expected number of claims made by people under 25 years of age is about 4. 348 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 11. (a) 7, 0.20 n p = = P r o b a b i l i t y 7 6 5 4 3 2 1 0 0.4 0.3 0.2 0.1 0.0 0.00030009 0.00430129 0.0286086 0.114634 0.275283 0.36711 0.209763 n = 7 , p = 0 . 2 0 (b) ( ) ( ) ( ) 7 0.20 1.4 7 0.20 0.80 1.058 np npq µ σ = = = = = ≈ We expect 1.4 people to be illiterate. (c) Let success = literate and p = 0.80. Find n such that ( ) 7 0.98. P r ≥ = Try n = 12. ( ) ( ) ( ) ( ) ( ) ( ) ( ) 7 7 8 9 10 11 12 0.053 0.133 0.236 0.283 0.206 0.069 0.98 P r P P P P P P ≥ = + + + + + = + + + + + = You would need to interview 12 people to be 98% sure that at least 7 of these people are not illiterate. 12. (a) 12, 0.35 n p = = P r o b a b i l i t y 12 11 10 9 8 7 6 5 4 3 2 1 0 0.25 0.20 0.15 0.10 0.05 0.00 n = 1 2 , p = 0 . 3 5 Part IV: Complete Solutions, Chapter 5 349 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( ) 12 0.35 4.2 np µ = = = The expected number of vehicles out of 12 that will tailgate is 4.2. (c) ( ) ( ) 12 0.35 0.65 1.65 npq σ = = ≈ 13. (a) 8, 0.25 n p = = P r o b a b i l i t y 8 7 6 5 4 3 2 1 0 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 . 0 0 0 3 0 0 1 2 0 . 0 0 3 8 0 1 5 2 0 . 0 2 3 0 0 9 2 0 . 0 8 6 5 3 4 6 0 . 2 0 7 6 8 3 0 . 3 1 1 5 2 5 0 . 2 6 7 0 0 7 0 . 1 0 0 1 4 n = 8 , p = 0 . 2 5 (b) ( ) ( ) ( ) 8 0.25 2 8 0.25 0.75 1.225 np npq µ σ = = = = = ≈ We expect two people to believe that the product is actually improved. (c) Find n such that ( ) 1 0.99. P r ≥ = Try n = 16. ( ) ( ) 1 1 0 1 0.01 0.99 P r P ≥ = − = − = Sixteen people are needed in the marketing study to be 99% sure that at least one person believes the product to be improved. 14. p = 0.10 Find n such that ( ) 1 0.90 P r ≥ = From a calculator or a computer, we determine n = 22 gives P(r ≥ 1) = 0.9015. 15. (a) Since success = not a repeat offender, then p = 0.75. r 0 1 2 3 4 P(r) 0.004 0.047 0.211 0.422 0.316 350 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. P r o b a b i l i t y 4 3 2 1 0 0.4 0.3 0.2 0.1 0.0 0.317269 0.422691 0.210843 0.0461847 0.00301205 n = 4, p = 0.75 (c) ( ) 4 0.75 3 np µ = = = We expect three parolees to not repeat offend. ( ) ( ) 4 0.75 0.25 0.866 npq σ = = ≈ (d) Find n such that ( ) 3 0.98 P r ≥ = Try n = 7. ( ) ( ) ( ) ( ) ( ) ( ) 3 3 4 5 6 7 0.058 0.173 0.311 0.311 0.133 0.986 P r P P P P P ≥ = + + + + = + + + + = This is slightly higher than needed, but n = 6 yields P(r ≥ 3) = 0.963. Alice should have a group of seven to be about 98% sure that three or more will not become repeat offenders. 16. (a) p = 0.65 Find n such that ( ) 1 0.98 P r ≥ = Try n = 4. ( ) ( ) 1 1 0 1 0.015 0.985 P r P ≥ = − = − = Four stations are required to be 98% certain that an enemy plane flying over will be detected by at least one station. (b) n = 4, p = 0.65 ( ) 4 0.65 2.6 np µ = = = If four stations are in use, we expect 2.6 stations to detect an enemy plane. 17. (a) Let success = available, then p = 0.75, n = 12. ( ) 12 0.032. P = Part IV: Complete Solutions, Chapter 5 351 Copyright © Houghton Mifflin Company. All rights reserved. (b) Let success = not available, then p = 0.25, n = 12. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 6 6 7 8 9 10 11 12 0.040 0.011 0.002 0.000 0.000 0.000 0.000 0.053 P r P P P P P P P ≥ = + + + + + + = + + + + + + = (c) n = 12, p = 0.75 ( ) 12 0.75 9 np µ = = = The expected number of those available to serve on the jury is nine. ( ) ( ) 12 0.75 0.25 1.5 npq σ = = = (d) p = 0.75 Find n such that ( ) 12 0.959. P r ≥ = Try n = 20. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 12 12 13 14 15 16 17 18 19 20 0.061 0.112 0.169 0.202 0.190 0.134 0.067 0.021 0.003 0.959 P r P P P P P P P P P ≥ = + + + + + + + + = + + + + + + + + = The jury commissioner must contact 20 people to be 95.9% sure of finding at least 12 people who are available to serve. 18. (a) Let success = emergency, then p = 0.15, n = 4. ( ) 4 0.001. P = (b) Let success = not emergency, then p = 0.85, n = 4. ( ) ( ) ( ) 3 3 4 0.368 0.522 0.890 P r P P ≥ = + = + = (c) p = 0.15 Find n such that ( ) 1 0.96. P r ≥ = Try n = 20. ( ) ( ) 1 1 0 1 0.039 0.961 P r P ≥ = − = − = The operators need to answer 20 calls to be 96% (or more) sure that at least one call was in fact an emergency. 19. Let success = case solved, then p = 0.2, n = 6. (a) ( ) 0 0.262 P = (b) ( ) ( ) 1 1 0 1 0.262 0.738 P r P ≥ = − = − = (c) ( ) 6 0.20 1.2 np µ = = = The expected number of crimes that will be solved is 1.2. ( ) ( ) 6 0.20 0.80 0.98 npq σ = = ≈ 352 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (d) Find n such that ( ) 1 0.90. P r ≥ = Try n = 11. ( ) ( ) 1 1 0 1 0.086 0.914 P r P ≥ = − = − = [ : For 10, ( 1) 0.893.] Note n P r = ≥ = The police must investigate 11 property crimes before they can be at least 90% sure of solving one or more cases. 20. (a) p = 0.55 Find n such that ( ) 1 0.99. P r ≥ = Try n = 6. ( ) ( ) 1 1 0 1 0.008 0.992 P r P ≥ = − = − = Six alarms should be used to be 99% certain that a burglar trying to enter is detected by at least one alarm. (b) 9, 0.55 n p = = ( ) 9 0.55 4.95 np µ = = = The expected number of alarms that would detect a burglar is about five. 21. (a) Japan: n = 7, p = 0.95. ( ) 7 0.698 P = United States: n = 7, p = 0.60. ( ) 7 0.028 P = (b) Japan: n = 7, p = 0.95. ( ) ( ) ( ) 7 0.95 6.65 7 0.95 0.05 0.58 np npq µ σ = = = = = ≈ United States: n = 7, p = 0.60. ( ) ( ) ( ) 7 0.60 4.2 7 0.60 0.40 1.30 np npq µ σ = = = = = ≈ The expected number of guilty verdicts in Japan is 6.65, and in the United States it is 4.2. Part IV: Complete Solutions, Chapter 5 353 Copyright © Houghton Mifflin Company. All rights reserved. (c) United States: p = 0.60. Find n such that ( ) 2 0.99. P r ≥ = Try n = 8. ( ) ( ) ( ) ( ) 2 1 0 1 1 0.001 0.008 0.991 P r P P ≥ = − + ( ¸ ¸ = − + = Japan: p = 0.95. Find n such that ( ) 2 0.99. P r ≥ = Try n = 3. ( ) ( ) ( ) 2 2 3 0.135 0.857 0.992 P r P P ≥ = + = + = Cover eight trials in the United States and three trials in Japan. 22. 6, 0.45 n p = = (a) ( ) 6 0.008 P = (b) ( ) 0 0.028 P = (c) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 3 4 5 6 0.278 0.303 0.186 0.061 0.008 0.836 P r P P P P P ≥ = + + + + = + + + + = (d) ( ) 6 0.45 2.7 np µ = = = The expected number is 2.7. ( ) ( ) 6 0.45 0.55 1.219 npq σ = = ≈ (e) Find n such that ( ) 3 0.90. P r ≥ = Try n = 10. ( ) ( ) ( ) ( ) ( ) 3 1 0 1 2 1 0.003 0.021 0.076 1 0.100 0.900 P r P P P ≥ = − + + ( ¸ ¸ = − + + = − = You need to interview 10 professors to be at least 90% sure of filling the quota. 23. (a) p = 0.40 Find n such that ( ) 1 0.99. P r ≥ = Try n = 9. ( ) ( ) 1 1 0 1 0.010 0.990 P r P ≥ = − = − = The owner must answer nine inquiries to be 99% sure of renting at least one room. (b) n = 25, p = 0.40 ( ) 25 0.40 10 np µ = = = The expected number is 10 room rentals. 354 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 24. (a) Out of n trials, there can be 0 through n successes. The sum of the probabilities for all members of the sample space must be 1. (b) r ≥ 1 consists of all members of the sample space except r = 0. (c) r ≥ 2 consists of all members of the sample space except r = 0 and r = 1. (d) r ≥ m consists of all members of the sample space except for r values between 0 and m − 1. Section 5.4 1. The geometric distribution 2. The mean or expected value, denoted by λ. 3. No, since the approximation requires n ≥ 100. 4. Yes. We have np = 150 × 0.02 = 3 ≤ 10 and n = 100 ≥ 10. 5. (a) Geometric probability distribution, p = 0.77. ( ) ( ) ( ) ( ) ( ) 1 1 1 0.77 0.23 n n P n p p P n − − = − = (b) ( ) ( ) ( ) ( ) ( ) 1 1 0 1 0.77 0.23 0.77 0.23 0.77 P − = = = (c) ( ) ( ) ( ) ( ) ( ) 2 1 1 2 0.77 0.23 0.77 0.23 0.1771 P − = = = (d) ( ) ( ) ( ) 3 or more tries 1 1 2 1 0.77 0.1771 0.0529 P P P = − − = − − = (e) 1 1 1.29 0.77 p µ = = ≈ The expected number is 1.29, or 1, attempt to pass. 6. (a) Geometric probability distribution, p = 0.57. ( ) ( ) ( ) ( ) ( ) 1 1 1 0.57 0.43 n n P n p p P n − − = − = (b) ( ) ( ) ( ) ( ) ( ) 2 1 1 2 0.57 0.43 0.57 0.43 0.2451 P − = = = (c) ( ) ( ) ( ) ( ) ( ) 3 1 2 3 0.57 0.43 0.57 0.43 0.1054 P − = = ≈ Part IV: Complete Solutions, Chapter 5 355 Copyright © Houghton Mifflin Company. All rights reserved. (d) ( ) ( ) ( ) ( ) more than 3 attempts 1 1 2 3 1 0.57 0.2451 0.1054 0.0795 P P P P = − − − = − − − = (e) 1 1 1.75 0.57 p µ = = ≈ The expected number is 1.75, or 2, attempts to pass. 7. (a) Geometric probability distribution, p = 0.80. ( ) ( ) ( ) ( ) ( ) 1 1 1 0.80 0.20 n n P n p p P n − − = − = (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 2 1 3 1 1 0.80 0.20 0.80 2 0.80 0.20 0.16 3 0.80 0.20 0.032 P P P − − − = = = = = = (c) ( ) ( ) ( ) ( ) 4 1 1 2 3 1 0.80 0.16 0.032 0.008 P n P P P ≥ = − − − = − − − = (d) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 2 1 3 1 0.04 0.96 1 0.04 0.96 0.04 2 0.04 0.96 0.0384 3 0.04 0.96 0.0369 n P n P P P − − − − = = = = = = = ( ) ( ) ( ) ( ) 4 1 1 2 3 1 0.04 0.0384 0.0369 0.8847 P n P P P ≥ = − − − = − − − = 8. (a) Geometric probability distribution, p = 0.36. ( ) ( ) ( ) ( ) ( ) 1 1 1 0.036 0.964 n n P n p p P n − − = − = (b) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) 3 1 5 1 12 1 3 0.036 0.964 0.03345 5 0.036 0.964 0.0311 12 0.036 0.964 0.0241 P P P − − − = ≈ = ≈ = ≈ (c) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 3 5 1 1 2 3 4 1 0.036 0.036 0.964 0.036 0.964 0.036 0.964 1 0.036 0.0347 0.03345 0.03225 0.8636 P n P P P P ≥ = − − − − = − − − − = − − − − = (d) 1 1 27.8 0.036 p µ = = ≈ The expected number is 27.8, or 28, apples. 356 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 9. (a) Geometric probability distribution, p = 0.30. ( ) ( ) ( ) ( ) ( ) 1 1 1 0.30 0.70 n n P n p p P n − − = − = (b) ( ) ( ) ( ) 3 1 3 0.30 0.70 0.147 P − = = (c) ( ) ( ) ( ) ( ) ( ) ( ) 3 1 1 2 3 1 0.30 0.30 0.70 0.147 1 0.30 0.21 0.147 0.343 P n P P P > = − − − = − − − = − − − = (d) 1 1 3.33 0.30 p µ = = = The expected number is 3.33, or 3, trips. 10. (a) The Poisson distribution would be a good choice because finding prehistoric artifacts is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of artifacts found in a fixed amount of sediment. 1.5 5 7.5 ; 7.5 per 50 liters 10 L 5 50 L λ λ = = = ⋅ ( ) ( ) ( ) 7.5 ! 7.5 ! r r e P r r e P r r λ λ − − = = (b) ( ) ( ) ( ) ( ) ( ) ( ) 2 7.5 3 7.5 4 7.5 7.5 2 0.0156 2! 7.5 3 0.0389 3! 7.5 4 0.0729 4! e P e P e P − − − = ≈ = ≈ = ≈ (c) ( ) ( ) ( ) ( ) 3 1 0 1 2 1 0.0006 0.0041 0.0156 0.9797 P r P P P ≥ = − − − = − − − = (d) ( ) ( ) ( ) ( ) 3 0 1 2 0.0006 0.0041 0.0156 0.0203 P r P P P < = + + = + + = or ( ) ( ) 3 1 3 1 0.9797 0.0203 P r P r < = − ≥ = − = Part IV: Complete Solutions, Chapter 5 357 Copyright © Houghton Mifflin Company. All rights reserved. 11. (a) The Poisson distribution would be a good choice because frequency of initiating social grooming is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of times that one otter initiates social grooming in a fixed time interval. 1.7 3 5.1 ; 5.1 per 30min interval 10 min 3 30 min λ λ = = = ⋅ ( ) ( ) ( ) 5.1 ! 5.1 ! r r e P r r e P r r λ λ − − = = (b) ( ) ( ) ( ) ( ) ( ) ( ) 4 5.1 5 5.1 6 5.1 5.1 4 0.1719 4! 5.1 5 0.1753 5! 5.1 6 0.1490 6! e P e P e P − − − = ≈ = ≈ = ≈ (c) ( ) ( ) ( ) ( ) ( ) 4 1 0 1 2 3 1 0.0061 0.0311 0.0793 0.1348 0.7487 P r P P P P ≥ = − − − − = − − − − = (d) ( ) ( ) ( ) ( ) ( ) 4 0 1 2 3 0.0061 0.0311 0.0793 0.1348 0.2513 P r P P P P < = + + + = − − + = or ( ) ( ) 4 1 4 1 0.7487 0.2513 P r P r < = − ≥ = − = 12. (a) The Poisson distribution would be a good choice because frequency of shoplifting is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of incidents in a fixed time interval. 11 11 3 3 11 3 1 11 ; 3.7 per 11 hours 3 hours 11 hours 3 λ λ = = = ≈ ⋅ (rounded to nearest tenth) (b) ( ) ( ) 1 1 0 1 0.0247 0.9753 P r P ≥ = − = − = (c) ( ) ( ) ( ) ( ) 3 1 0 1 2 1 0.0247 0.0915 0.1692 0.7146 P r P P P ≥ = − − − = − − − = (d) ( ) 0 0.0247 P = 358 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 13. (a) The Poisson distribution would be a good choice because frequency of births is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of births (or deaths) for a community of a given population size. (b) For 1,000 people, 16 births; 8 deaths. λ λ = = By Table 4 in Appendix II: ( ) ( ) ( ) ( ) 10 births 0.0341 10 deaths 0.0993 16 births 0.0992 16 deaths 0.0045 P P P P = = = = (c) For 1,500 people, 16 1.5 24 ; 24 births per 1,500 people 1, 000 1.5 1, 500 λ λ = = = ⋅ 8 1.5 12 ; 12 deaths per 1500 people 1000 1.5 1500 λ λ = = = ⋅ By Table 4 in Appendix II or a calculator: ( ) ( ) ( ) ( ) 10 births 0.00066 10 deaths 0.1048 16 births 0.02186 16 deaths 0.0543 P P P P = = = = (d) For 750 people, 16 0.75 12 ; 12 births per 750 people 1, 000 0.75 750 8 0.75 6 ; 6 deaths per 750 people 1, 000 0.75 750 λ λ λ λ = = = = = = ⋅ ⋅ ( ) ( ) ( ) ( ) 10 births 0.1048 10 deaths 0.0413 16 births 0.0543 16 deaths 0.0003 P P P P = = = = 14. (a) The Poisson distribution would be a good choice because frequency of hairline cracks is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of hairline cracks for a given length of retaining wall. (b) 5 3 5 3 4.2 7 ; 7 per 50 ft 30 ft 50 ft λ λ = = = ⋅ From Table 4 in Appendix II: ( ) ( ) ( ) ( ) ( ) 3 0.0521 3 1 0 1 2 1 0.0009 0.0064 0.0223 0.9704 P P r P P P = ≥ = − − − = − − − = Part IV: Complete Solutions, Chapter 5 359 Copyright © Houghton Mifflin Company. All rights reserved. (c) 2 3 2 3 4.2 2.8 30 ft 20 ft 2.8 per 20 ft λ λ = = = ⋅ ( ) ( ) ( ) ( ) ( ) 3 0.2225 3 1 0 1 2 1 0.0608 0.1703 0.2384 0.5305 P P r P P P = ≥ = − − − = − − − = (d) 1 15 1 15 4.2 0.28 30 ft 2 ft 0.3 per 2 ft λ λ = = = ⋅ ( ) ( ) ( ) ( ) ( ) 3 0.0033 3 1 0 1 2 1 0.7408 0.2222 0.0333 0.0037 P P r P P P = ≥ = − − − = − − − = (e) Three hairline cracks spread out evenly over a 50-foot section is no cause for concern. For part (c), we expect 2.8 cracks per 20 feet, so actually having 3 cracks is nothing unusual. For part (d), actually having 3 cracks is unusual and a cause for concern. 15. (a) The Poisson distribution would be a good choice because frequency of gale-force winds is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of gale-force winds in a given time interval. (b) 1 1.8 1.8 ; 1.8 per 108 hours 60 hours 1.8 108 hours λ λ = = = ⋅ From Table 4 in Appendix II: ( ) ( ) ( ) ( ) ( ) ( ) 2 0.2678 3 0.1607 4 0.0723 2 0 1 0.1653 0.2975 0.4628 P P P P r P P = = = < = + = + = (c) 1 3 3 ; 3 per 180 hours 60 hours 3 180 hours λ λ = = = ⋅ P(3) = 0.2240 P(4) = 0.1680 P(5) = 0.1008 P(r < 3) = P(0) + P(1) + P(2) = 0.0498 + 0.1494 + 0.2240 = 0.4232 360 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 16. (a) The Poisson distribution would be a good choice because frequency of earthquakes is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of earthquakes in a given time interval. (b) ( ) 1.00 per 22 years 1 0.6321 P r λ = ≥ = (c) ( ) 1.00 per 22 years 0 0.3679 P λ = = (d) ( ) 25 11 25 11 1 2.27 ; 2.27 per 50 years 22 years 50 years 1 0.8967 P r λ λ = ≈ = ≥ = ⋅ (e) ( ) 2.27 per 50 years 0 0.1033 P λ = = 17. (a) The Poisson distribution would be a good choice because frequency of commercial building sales is a relatively rare occurrence. It is reasonable to assume that the events are independent and the variable is the number of buildings sold in a given time interval. (b) 12 96 55 55 12 55 8 96 ; 1.7 per 60 days 275 days 60 days 55 λ λ = ≈ = ≈ ⋅ From Table 4 in Appendix II: ( ) ( ) ( ) ( ) ( ) 0 0.1827 1 0.3106 2 1 0 1 1 0.1827 0.3106 0.5067 P P P r P P = = ≥ = − − = − − = (c) 18 55 18 55 8 2.6 ; 2.6 per 90 days 275 days 90 days λ λ = ≈ ≈ ⋅ ( ) ( ) ( ) ( ) ( ) ( ) 0 0.0743 2 0.2510 3 1 0 1 2 1 0.0743 0.1931 0.2510 0.4816 P P P r P P P = = ≥ = − − − = − − − = 18. (a) The problem satisfies the conditions for a binomial experiment with ( ) 661 100,000 large, 316, and small, 0.00661. 316 0.00661 2.1 10. n n p p np = = = = ≈ < The Poisson distribution would be a good approximation to the binomial. 316, 0.00661, 2.1 n p np λ = = = ≈ (b) From Table 4 in Appendix II, ( ) 0 0.1225. P = Part IV: Complete Solutions, Chapter 5 361 Copyright © Houghton Mifflin Company. All rights reserved. (c) ( ) ( ) ( ) 1 0 1 0.1225 0.2572 0.3797 P r P P ≤ = + = + = (d) ( ) ( ) 2 1 1 1 0.3797 0.6203 P r P r ≥ = − ≤ = − = 19. (a) The problem satisfies the conditions for a binomial experiment with ( ) 1 569 large, 1000, and small, 0.0018. 1000 0.0018 1.8 10. n n p p np = = ≈ ≈ = < The Poisson distribution would be a good approximation to the binomial. 1.8 np λ = ≈ (b) From Table 4 in Appendix II, ( ) 0 0.1653. P = (c) ( ) ( ) ( ) 1 1 0 1 1 0.1653 0.2975 0.5372 P r P P > = − − = − − = (d) ( ) ( ) ( ) 2 1 2 0.5372 0.2678 0.2694 P r P r P > = > − = − = (e) ( ) ( ) ( ) 3 2 3 0.2694 0.1607 0.1087 P r P r P > = > − = − = 20. (a) The Poisson distribution would be a good choice because frequency of lost bags is a relatively rare occurrence. It is reasonable to assume that the events are independent and the variable is the number of bags lost per 1,000 passengers. 6.02 or 6.0 per 1,000 passengers λ = (b) From Table 4 in Appendix II, ( ) ( ) ( ) ( ) ( ) 0 0.0025 3 1 0 1 2 1 0.0025 0.0149 0.0446 0.9380 P P r P P P = ≥ = − − − = − − − = ( ) ( ) ( ) ( ) ( ) 6 3 3 4 5 0.9380 0.0892 0.1339 0.1606 0.5543 P r P r P P P ≥ = ≥ − − − = − − − = (c) 13.0 per 1,000 passengers λ = ( ) ( ) ( ) ( ) ( ) 0 0.000 (to 3 digits) 6 1 5 1 0.0107 0.9893 12 1 11 1 0.3532 0.6468 P P r P r P r P r = ≥ = − ≤ = − = ≥ = − ≤ = − = 362 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 21. (a) The problem satisfies the conditions for a binomial experiment with n large, n = 175, and p small, p = 0.005. np = (175)(0.005) = 0.875 < 10. The Poisson distribution would be a good approximation to the binomial. n = 175, p = 0.005, λ = np = 0.9. (b) From Table 4 in Appendix II, ( ) 0 0.4066. P = (c) ( ) ( ) 1 1 0 1 0.4066 0.5934 P r P ≥ = − = − = (d) ( ) ( ) ( ) 2 1 1 0.5934 0.3659 0.2275 P r P r P ≥ = ≥ − = − = 22. (a) The problem satisfies the conditions for a binomial experiment with n large, n = 137, and p small, p = 0.02. np = (137)(0.02) = 2.74 < 10. The Poisson distribution would be a good approximation to the binomial. n = 175, p = 0.02, λ = np = 2.74 ≈ 2.7. (b) From Table 4 in Appendix II, ( ) 0 0.0672. P = (c) ( ) ( ) ( ) 2 1 0 1 1 0.0672 0.1815 0.7513 P r P P ≥ = − − = − − = (d) ( ) ( ) ( ) ( ) 4 2 2 3 0.7513 0.2450 0.2205 0.2858 P r P r P P ≥ = ≥ − − = − − = 23. (a) n = 100, p = 0.02, r = 2 ( ) ( ) ( ) ( ) ( ) ( ) ( ) , 2 100 2 100,2 1 2 0.02 0.98 4950 0.0004 0.1381 0.2734 n r r n r P r C p p P C − − = − = = = (b) ( ) 100 0.02 2 np λ = = = From Table 4 in Appendix II, ( ) 2 0.2707. P = (c) Yes, the approximation is correct to two decimal places. (d) n = 100; p = 0.02; r = 3 By the formula for the binomial distribution, ( ) ( ) ( ) ( ) ( ) 3 100 3 100,3 3 0.02 0.98 161,700 0.000008 0.1409 0.1823 P C − = = = By the Poisson approximation, λ = 3, P(3) = 0.1804. This is correct to two decimal places. 24. (a) The Poisson distribution would be a good choice because frequency of fish caught is a relatively rare occurrence. It is reasonable to assume that the events are independent and that the variable is the number of fish caught in the 8-hour period. 0.667 8 5.3 1 8 8 5.3 fish caught per 8 hours λ λ = ⋅ ≈ ≈ Part IV: Complete Solutions, Chapter 5 363 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( 7 and 3) ( 7, given 3) ( 3) ( 7) ( 3) ( 7) 1 ( 2) 0.1163 0.0771 0.0454 0.0241 0.0116 0.0051 0.0021 0.0008 0.0003 0.0001 1 (0.0050 0.0265 0.0701) 0.2829 0.8984 0.3149 P r r P r r P r P r P r P r P r ≥ ≥ ≥ ≥ = ≥ ≥ = ≥ ≥ = − ≤ + + + + + + + + + = − + + = ≈ (c) ( 9 and 4) ( 9, given 4) ( 4) ( 4) ( 5) ( 6) ( 7) ( 8) 1 ( 3) 0.1641 0.1740 0.1537 0.1163 0.0771 1 (0.0050 0.0265 0.0701 0.1239) 0.6852 0.7745 0.8847 P r r P r r P r P r P r P r P r P r P r < ≥ < ≥ = ≥ = + = + = + = + = = − ≤ + + + + = − + + + = ≈ (d) Possibilities include the fields agriculture, the military, and business. 25. (a) The Poisson distribution would be a good choice because hail storms in western Kansas are relatively rare occurrences. It is reasonable to assume that the events are independent and that the variable is the number of hailstorms in a fixed-square-mile area. ( ) 8 8 5 5 8 5 2.1 2.1 3.4 5 8 8 λ = ⋅ = ≈ 3.4 storms per 8 square miles λ = (b) ( 4 and 2) ( 4, given 2) ( 2) ( 4) ( 2) 1 ( 3) 1 ( 1) 1 (0.0334 0.1135 0.1929 0.2186) 1 (0.0334 0.1135) 0.4416 0.8531 0.5176 P r r P r r P r P r P r P r P r ≥ ≥ ≥ ≥ = ≥ ≥ = ≥ − ≤ = − ≤ − + + + = − + = = 364 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (c) ( 6 and 3) ( 6, given 3) ( 3) ( 3) ( 4) ( 5) 1 ( 2) 0.2186 0.1858 0.1264 1 (0.0334 0.1135 0.1929) 0.5308 0.6602 0.8040 P r r P r r P r P r P r P r P r < ≥ < ≥ = ≥ = + = + = = − ≤ + + = − + + = = 26. (a) 1, 1 4 4 1, 3 ( ) ( ) (0.65 )(0.35 ) k n k n k n n P n C p q P n C − − − − − = = (b) 4 0 3, 3 (4) (0.65 )(0.35 ) 0.1785 P C = ≈ 4 1 4, 3 (5) (0.65 )(0.35 ) 0.2499 P C = ≈ 4 2 5, 3 (6) (0.65 )(0.35 ) 0.2187 P C = ≈ 4 2 6, 3 (7) (0.65 )(0.35 ) 0.1531 P C = ≈ (c) (4 7) (4) (5) (6) (7) 0.1785 0.2499 0.2187 0.1531 0.8002 P n P P P P ≤ ≤ = + + + = + + + = (d) ( 8) 1 ( 7) 1 (4 7) 1 0.8002 0.1998 P n P n P n ≥ = − ≤ = − ≤ ≤ = − = (e) 4 6.15 0.65 k p µ = = ≈ 4(0.35) 1.82 0.65 kq p σ = = ≈ The expected year in which the fourth successful crop occurs is 6.15, with a standard deviation of 1.82. 27. (a) We have binomial trials for which the probability of success is p = 0.80 and failure is q = 0.20; k = 12 is a fixed whole number ≥ 1; n is a random variable representing the number of contacts needed to get the twelfth sale. 1, 1 12 12 1, 11 ( ) ( ) (0.80 )(0.20 ) k n k n k n n P n C p q P n C − − − − − = = (b) 12 0 11, 11 (12) (0.80 )(0.20 ) 0.0687 P C = ≈ 12 1 12, 11 (13) (0.80 )(0.20 ) 0.1649 P C = ≈ 12 2 13, 11 (14) (0.80 )(0.20 ) 0.2144 P C = ≈ Part IV: Complete Solutions, Chapter 5 365 Copyright © Houghton Mifflin Company. All rights reserved. (c) (12 14) (12) (13) (14) 0.0687 0.1649 0.2144 0.4480 P n P P P ≤ ≤ = + + = + + = (d) ( 14) 1 ( 14) 1 (12 14) 1 0.4480 0.5520 P n P n P n > = − ≤ = − ≤ ≤ = − = (e) 12 15 0.80 k p µ = = = 12(0.20) 1.94 0.80 kq p = = ≈ σ The expected contact in which the twelfth sale will occur is the fifteenth contact, with a standard deviation of 1.94. 28. (a) We have binomial trials for which the probability of success is p = 0.41 and failure is q = 0.59; k = 3 is a fixed whole number ≥ 1; and n is a random variable representing the number of donors needed to provide 3 pints of type A blood. 1, 1 3 3 1, 2 ( ) ( ) (0.41 )(0.59 ) k n k n k n n P n C p q P n C − − − − − = = (b) 3 0 2, 2 (3) (0.41 )(0.59 ) 0.0689 P C = ≈ 3 1 3, 2 (4) (0.41 )(0.59 ) 0.1220 P C = ≈ 3 2 4, 2 (5) (0.41 )(0.59 ) 0.1439 P C = ≈ 3 3 5, 2 (6) (0.41 )(0.59 ) 0.1415 P C = ≈ (c) (3 6) (3) (4) (5) (6) 0.0689 0.1220 0.1439 0.1415 0.4763 P n P P P P ≤ ≤ = + + + = + + + = (d) ( 6) 1 ( 6) 1 (3 6) 1 0.4763 0.5237 P n P n P n > = − ≤ = − ≤ ≤ = − = (e) 3(0.59) 3 7.32; 3.24 0.41 0.41 kq k p p µ σ = = ≈ = = ≈ The expected number of donors in which the third pint of blood type A is acquired is 7.32, with a standard deviation of 3.24. 29. (a) This is binomial with n – 1 trials and probability of success p. Thus we use the binomial probability distribution: 1 1 ( 1) 1, 1 ( ) * k n k n k P A C p q − − − − − − = (b) P(B) = success on one trial, namely, the nth trial = p (by definition). (c) By the definition of independent trials, P(A and B) = P(A) × P(B). 366 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. (d) P(A and B) = 1 1 ( 1) 1, 1 1, 1 k n k n k k n k n k C p q p C p q − − − − − − − − − × × = × (e) The results are the same. Chapter 5 Review 1. A description of all the values of a random variable x, the associated probabilities for each value of x, the summation of the probabilities equal 1, and each probability takes on values between 0 and 1 inclusive. 2. The criteria: a fixed number of trials that are repeated under identical conditions; the trials are independent and have exactly two possible outcomes; and the probability of success on each trial is constant. The random variable counts the number of successes r in n trials. 3. (a) Yes, we expect np = 10 × 0.2 = 2 successes. The standard deviation is σ = 1.26, so the boundary is 2 + (2.5) × (1.26) = 5.15. Thus six successes is unusual. (b) No, P(x > 5) = 0.0064 (using a TI-83 calculator). 4. As the number of trials n increases, μ = np will increase, and npq σ = also will increase. 5. (a) ( ) ( ) ( ) ( ) ( ) ( ) 18.5 0.127 30.5 0.371 42.5 0.285 54.5 0.215 66.5 0.002 37.628 37.63 xP x µ = ∑ = + + + + = ≈ The expected lease term is about 38 months. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 19.13 0.127 7.13 0.371 4.87 0.285 16.87 0.215 28.87 0.002 134.95 11.6 (using 37.63 in the calculations) x P x σ µ µ = ∑ − = − + − + + + ≈ ≈ = (b) Part IV: Complete Solutions, Chapter 5 367 Copyright © Houghton Mifflin Company. All rights reserved. P e r c e n t 66.5 54.5 42.5 30.5 18.5 40 30 20 10 0 Hi st ogr am of Lengt h of Lease 6. (a) Number Killed by Wolves Relative P(x) 112 112/296 0.378 53 53/296 0.179 73 73/296 0.247 56 56/296 0.189 2 2/296 0.007 (b) ( ) ( ) ( ) ( ) ( ) ( ) 0.5 0.378 3 0.179 8 0.247 13 0.189 18 0.007 5.28 years xP x µ = ∑ = + + + + ≈ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 4.78 0.378 2.28 0.179 2.72 0.247 7.72 0.189 12.72 0.007 23.8 4.88 years x P x σ µ = ∑ − = − + − + + + = ≈ 7. This is a binomial experiment with 10 trials. A trial consists of a claim. Success submitted by a male under 25 years of age. Failure not submitted by a male under 25 years of age. = = (a) The probabilities can be taken directly from Table 3 in Appendix II: n = 10, p = 0.55. 368 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. P r o b a b i l i t y 10 9 8 7 6 5 4 3 2 1 0 0.25 0.20 0.15 0.10 0.05 0.00 0 .0 0 2 5 0 1 2 5 0 .0 2 0 7 1 0 4 0 . 0 7 6 3 3 8 2 0 .1 6 6 4 8 3 0 .2 3 8 4 1 9 0 .2 3 4 1 1 7 0 .1 5 9 5 8 0 . 0 7 4 6 3 7 3 0 .0 2 2 8 1 1 4 0 . 0 0 4 1 0 2 0 5 0 .0 0 0 3 0 0 1 5 Hi st ogr am of Cl ai ms ( Mal es under 25) (b) P(x ≥ 6) = P(6) + P(7) + P(8) + P(9) + P(10) = 0.504 (c) ( ) 10 0.55 5.5 np µ = = = The expected number of claims made by males under age 25 is 5.5. ( ) ( ) 10 0.55 0.45 1.57 npq σ = = ≈ 8. (a) n = 20, p = 0.05 ( ) ( ) ( ) ( ) 2 0 1 2 0.358 0.377 0.189 0.924 P r P P P ≤ = + + = + + = (b) n = 20, p = 0.15 Probability accepted: ( ) ( ) ( ) ( ) 2 0 1 2 0.039 0.137 0.229 0.405 P r P P P ≤ = + + = + + = Probability not accepted: 1 0.405 0.595 − = 9. n = 16, p = 0.50 (a) ( ) ( ) ( ) ( ) ( ) ( ) 12 12 13 14 15 16 0.028 0.009 0.002 0.000 0.000 0.039 P r P P P P P ≥ = + + + + = + + + + = (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 7 0 1 2 3 4 5 6 7 0.000 0.000 0.002 0.009 0.028 0.067 0.122 0.175 0.403 P r P P P P P P P P ≤ = + + + + + + + = + + + + + + + = (c) ( ) 16 0.50 8 np µ = = = The expected number of inmates serving time for dealing drugs is eight. Part IV: Complete Solutions, Chapter 5 369 Copyright © Houghton Mifflin Company. All rights reserved. 10. n = 200, p = 0.80 ( ) 200 0.80 160 np µ = = = We expect 160 flights to arrive on time. ( ) ( ) 200 0.80 0.20 5.66 npq σ = = ≈ The standard deviation is 5.66 flights. 11. n = 10, p = 0.75 (a) The probabilities can be obtained directly from Table 3 in Appendix II. P r o b a b i l i t y 10 9 8 7 6 5 4 3 2 1 0 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Hi st ogr am of Good Gr apef r ui t (b) No more than one is bad is the same event as at least nine are good. ( ) ( ) ( ) 9 9 10 0.188 0.056 0.244 P r P P ≥ = + = + = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 2 3 4 5 6 7 8 9 10 0.000 0.000 0.003 0.016 0.058 0.146 0.250 0.282 0.188 0.056 0.999 P r P P P P P P P P P P ≥ = + + + + + + + + + = + + + + + + + + + = (c) ( ) 10 0.75 7.5 np µ = = = We expect 7.5 good grapefruits. (d) ( ) ( ) 10 0.75 0.25 1.37 npq σ = = ≈ 12. Let success = show up, then p = 0.95, n = 82. ( ) 82 0.95 77.9 np µ = = = If 82 party reservations have been made, 77.9, or about 78, can be expected to show up. ( ) ( ) 82 0.95 0.05 1.97 npq σ = = ≈ 13. p = 0.85, n = 12 ( ) ( ) ( ) ( ) 2 0 1 2 0.000 0.000 0.000 0.000 (to 3 digits) P r P P P ≤ = + + = + + = The data seem to indicate that the percent favoring the increase in fees is less than 85%. 370 Part IV: Complete Solutions, Chapter 5 Copyright © Houghton Mifflin Company. All rights reserved. 14. Let success = do not default, then p = 0.50. Find n such that ( ) 5 0.941. P r ≥ = Try n = 15. ( ) ( ) ( ) ( ) ( ) ( ) ( ) 5 1 0 1 2 3 4 1 0.000 0.000 0.003 0.014 0.042 1 0.059 0.941 P r P P P P P ≥ = − + + + + ( ¸ ¸ = − + + + + = − = You should buy 15 bonds if you want to be 94.1% sure that 5 or more will not default. 15. (a) The Poisson distribution would be a good choice because coughs are a relatively rare occurrence. It is reasonable to assume that they are independent events, and the variable is the number of coughs in a fixed time interval. (b) 11 per 1 minute λ = From Table 4 in Appendix II, ( ) ( ) ( ) ( ) ( ) 3 0 1 2 3 0.0000 0.0002 0.0010 0.0037 0.0049 P r P P P P ≤ = + + + = + + + = (c) 11 0.5 5.5 ; 5.5 per 30 seconds 60 seconds 0.5 30 seconds λ λ = = = ⋅ ( ) ( ) ( ) ( ) 3 1 0 1 2 1 0.0041 0.0225 0.0618 0.9116 P r P P P ≥ = − − − = − − − = 16. (a) The Poisson distribution would be a good choice because number of accidents is a relatively rare occurrence. It is reasonable to assume that they are independent events, and the variable is the number of accidents for a given number of operations. (b) 2.4 per 100,000 flight operations λ = From Table 4 in Appendix II, ( ) 0 0.0907. P = (c) 2.4 2 4.8 100,000 2 200,000 4.8 per 200,000 flight operations. λ λ = = = ⋅ ( ) ( ) ( ) ( ) ( ) 4 1 0 1 2 3 1 0.0082 0.0395 0.0948 0.1517 0.7058 P r P P P P ≥ = − − − − = − − − − = 17. The loan-default problem satisfies the conditions for a binomial experiment. Moreover, p is small, n is large, and np < 10. Using the Poisson approximation to the binomial distribution is appropriate. ( ) 1 300, 0.0029, 300 0.0029 0.86 0.9 350 n p np λ = = = = = ≈ ≈ From Table 4 in Appendix II, ( ) ( ) ( ) 2 1 0 1 1 0.4066 0.3659 0.2275 P r P P ≥ = − − = − − = Part IV: Complete Solutions, Chapter 5 371 Copyright © Houghton Mifflin Company. All rights reserved. 18. This problem satisfies the conditions for a binomial experiment. Moreover, p is small, n is large, and np < 10. Using the Poisson approximation to the binomial distribution is appropriate. n = 482, 551 0.00551 100, 000 p = = λ = np = 482(0.00551) ≈ 2.7 (a) P(0) = 0.0672 (b) ( 1) 1 (0) 1 0.0672 0.9328 P r P ≥ = − = − = (c) ( 2) 1 ( 1) 1 (0.0672 0.1815) 0.7513 P r P r ≥ = − ≤ = − + = 19. (a) Use the geometric distribution with p = 0.5. ( ) ( ) ( ) ( ) 2 2 0.5 0.5 0.5 0.25 P n = = = = P(n = 3) = (0.5)(0.5)(0.5) = 0.125 P(n = 4) = (0.5)(0.5)(0.5)(0.5) = 0.0625 This is the geometric probability distribution with p = 0.5. (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 3 4 2 3 4 4 0.5 0.5 0.5 0.0625 4 1 1 2 3 4 1 0.5 0.5 0.5 0.5 0.0625 P P n P P P P = = = > = − − − − = − − − − = 20. (a) Use the geometric distribution with p = 0.83. ( ) ( ) ( ) 1 1 1 0.83 0.17 0.83 P − = = (b) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 1 3 1 2 0.83 0.17 0.1411 3 0.83 0.17 0.0240 2 or 3 0.1411 0.0240 0.165 P P P − − = = = = = + ≈ Normal Distributions (Page 1 of 23) 6.1 Graphs of Normal Probability Distributions Normal Probability Distribution x Important Properties of a Normal Curve 1. The curve is bell-shaped with the highest point over the mean µ. 2. It is symmetric about the vertical line through µ. 3. The curve approaches the horizontal axis but never crosses or touches it. 4. The transition points (TP) are where the graph changes from cupping upward to cupping downward (or visa versa). The transition points occur at x = µ +σ and x = µ −σ . 5. The total area under the curve is 1. Guided Exercise 2 a. Which point (A, B or C) corresponds to µ +σ ? b. Which point (A, B or C) corresponds to µ − 2σ ? c. What is the mean and standard deviation of the distribution? 6 8 10 12 14 A B C x Normal Curve aka Probability Density Function x-axis TP TP µ + σ µ − σ µ Normal Distributions (Page 2 of 23) Example A The mean affects the location of the curve and the standard deviation affects the shape (spread) of the curve. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. C D Same Mean & Different Standard Deviations Which curve has the larger mean? Which curve has the larger standard deviation? A B Same Standard Deviation & Different Means Normal Distributions (Page 3 of 23) Guided Exercise 1 Determine whether each curve is normal or not. If it is not, then state why. Empirical Rule for a Normal Distribution a. Approximately 68.2% of the data will lie within 1 standard deviation of the mean. b. Approximately 95.4% of the data will lie within 2 standard deviations of the mean. c. Approximately 99.7% of the data will lie within 3 standard deviations of the mean. d. The area under the curve represents the probability and the total area under the curve is 1. Empirical Rule QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 68.2% 13.6% 2.15% 13.6% 2.15% 0.15% 0.15% µ + 3σ µ − 3σ µ − 2σ µ −σ µ µ +σ µ + 2σ x-axis Normal Distributions (Page 4 of 23) Example 1 The playing life of a Sunshine radio is normally distributed with a mean of 600 hours and a standard deviation of 100 hours. a. Sketch a normal curve showing the distribution of the playing life of the Sunshine radio. Scale and label the axis; include the transition points. Use the empirical rule to find the probability that a randomly selected radio will last b. between 600 and 700 hours? c. between 400 and 500 hours d. greater than 700 hours? Normal Distributions (Page 5 of 23) Guided Exercise 4 The annual wheat yield per acre on a farm is normally distributed with a mean of 35 bushels and a standard deviation of 8 bushels. a. Sketch a normal curve and shadein the area that represents the probability that an acre will yield between 19 and 35 bushels. b. Is the shaded area the same as the area between µ − 2σ and µ? Use the empirical rule to find the probability that the yield will be between 19 and 35 bushels per acre? Control Charts A control chart for a random variable x that is approximately normally distributed is a plot of observed x values in time sequence order. The construction is as follows: 1. Find the mean µ and standard deviation σ of the x distribution in one of two ways: (i) Use past data from a period during which the process was “in control” or (ii) Use a specified “target” value of µ and σ . 2. Create a graph where the vertical axis represents the x values and the horizontal axis represents time. 3. Draw a solid horizontal line at height µ and horizontal dashed control-limit lines at µ ± 2σ and µ ± 3σ . 4. Plot the variable x on the graph in time sequence order. Normal Distributions (Page 6 of 23) Example 2 Susan is director of personnel at Antlers Lodge and hires many college students every summer. One of the biggest activities for the lodge staff is to make the rooms ready for the next guest. Although the rooms are supposed to be ready by 3:30 pm, there are always some rooms not finished because of high turn over. Every 15 days Susan has a control chart of the number of rooms not made up by 3:30 pm each day. From past experience, Susan knows that the distribution of rooms not made up by 3:30 pm is approximately normal, with µ = 19.3 rooms and σ = 4.7 rooms. For the past 15 days the staff has reported the number of rooms not made up by 3:30 pm as: Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 11 20 25 23 16 19 8 25 17 20 23 29 18 14 10 a. Make a control chart for these data. Completely annotate. b. Is the housekeeping process out of control? Explain. Control Chart for the Number of Rooms Not Made Up by 3:30 PM µ µ − 2σ µ + 2σ µ − 3σ µ + 3σ Normal Distributions (Page 7 of 23) Out-of-Control Warning Signals 1. Out-of-Control Signal I: One point falls beyond the 3σ level. The probability that this is a false alarm is 0.003. 2. Out-of-Control Signal II: A run of 9 consecutive points on one side of the center line. The probability that this is a false alarm is 2⋅ (0.5) 9 = 0.004. 3. Out-of-Control Signal III: At least 2 of 3 consecutive points lie beyond the 2σ level on the same side of the center line. The probability that this is a false alarm is 0.002 µ + 2σ µ + 3σ µ + 3σ µ + 2σ µ µ + 2σ µ + 3σ µ + 3σ µ + 2σ µ µ + 2σ µ + 3σ µ + 3σ µ + 2σ µ Normal Distributions (Page 8 of 23) Example 3 Yellowstone Park Medical Services (YPMS) provides emergency medical care for park visitors. History has shown that the during the summer the mean number of visitors treated each day is 21.7 with a standard deviation of 4.2. For a 10-day summer period, the following numbers of people were treated: Day 1 2 3 4 5 6 7 8 9 10 Number Treated 20 15 12 21 24 28 32 36 35 37 a. Make a control chart and plot the data on the chart. b. Do the data indicate the number of visitors treated by YPMS is in control or out-of-control? Explain your answer in terms of the three out-of- control signal types. c. If you were the park superintendent, do you think YPMS might need some extra help? Explain. µ µ − 2σ µ + 2σ µ − 3σ µ + 3σ Normal Distributions (Page 9 of 23) 6.2 Standard Units and The Standard Normal Distribution Suppose Tina and Jack are in two different sections of the same course and they recently took midterms. Tina’s class average was 64 and she got a 74. Jack’s class average 72 and he got an 82. Who did better? z-Score or Standardized Score To standardize test scores we use the z-value or (z-score). The z- value, or z-score, or standardized score is the number of standard deviations a data value lies away from the mean. The z-score can be positive, negative, or zero depending on whether the data value is above the mean, below the mean, or at the mean, respectively. For any x-value in a normal distribution the standardized score, z- value, or z-score is given by: Note(s): 1. If x = µ, then z = 0 2. If x > µ, then z > 0 3. If x < µ, then z < 0 Fact: Unless otherwise stated, for now on, the average will mean the arithmetic mean. QuickTime™ and a decompressor are needed to see this picture. z-score positive x > µ z-score negative x < µ x µ z − score = x − µ σ Normal Distributions (Page 10 of 23) Example 3 Suppose Tina and Jack are in two different sections of the same course and they recently took midterms. The distribution of scores in both classes is normal. Tina’s class average was 64 with a standard deviation of 3; she earned a 74. Jack’s class average was 72 with a standard deviation of 5; he earned an 82. a. What was Tina’s z-score? Draw a distribution for Tina’s class showing Tina’s score within the distribution of scores for her class. b. What was Jack’s z-score? Draw a distribution for Jack’s class showing Jack’s score within the distribution scores for his class. c. Who did better? Why? QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 11 of 23) Example 4 A pizza parlor chain claims a large pizza has 8 oz of cheese with a standard deviation of 0.5 oz. An inspector ordered a pizza and found it only had 6.9 oz of cheese. Franchisee’s can lose their store if they make pizzas with 3 standard deviations (or more) of cheese below the mean. Assume the distribution of weights is normally distributed. a. Graph the x-distribution. Label and scale the axis. b. Find the z-score for x = 6.9 oz of cheese. c. Is the franchise in danger of losing its store? Why? d. Find the minimum amount of cheese a franchise can put on a large pizza so it is not in danger of losing its store. QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 12 of 23) Guided Exercise 6 The times it takes a student to get to class from home is normally distributed with a mean of 17 minutes and a standard deviation of 3 minutes. a. One day it took 21 minutes to get to class. How many standard deviations from the mean is that? Explain the sign. b. Another day it took 12 minutes to get to class. How many standard deviations from the mean is that? Explain the sign. c. On a third day it took 17 minutes to get to class. How many standard deviations from the mean is that? Explain the sign. z-Score Formulae z = x − µ σ or x = zσ + µ Normal Distributions (Page 13 of 23) Guided Exercise 7 Sam’s z-score on an exam is 1.3. If the distribution of scores is normally distributed with a mean of 480 and a standard deviation of 70, what is Sam’s raw score. Draw the distribution. Standard Normal Distribution / Curve The standard normal distribution is a normal distribution with mean µ = 0and standard deviation σ = 1. The standard normal curve is the graph of the standard normal distribution. The Standard Normal Curve µ = 0, σ = 1 QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. TP TP z QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 14 of 23) The Normal Cumulative Density Function The area under the normal curve with mean µ and standard deviation σ on the interval from a to b represents the probability that a randomly chosen value for x lies between a and b. It is given by the normal cumulative density function (normalcdf): Access the DISTR menu (2nd > VARS) and select 2:normalcdf(lower bound, upper bound, [ µ, σ ]). Example 6 Find the probability that z is between -1 and 1. To show all work and receive full credit (a) (20-50% of the credit) Sketch the normal curve and shade in the area in question. Label and scale the axis. Put dots at the transition points and tick marks on the axis to 3 standard deviations on each side of the mean. (b) (50-70% of the credit) Compute the area under the curve, hence the probability, showing the probability notation, the TI- 83 function accessed along with its inputs and output. Round probabilities to 4 decimal places. QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z Area = P(−1 < z <1) = normalcdf (___, ___, ___, ___) = __________ Area = P(a < x < b) = normalcdf (a,b,[µ,σ]) b a x Normal Distributions (Page 15 of 23) Example 7 Find the probability that z is 1. between -3 and 3 2. greater than 1 3. between 0 and 2.53 4. greater than 2.53 5. less than -2.34 6. between -2 and 2 QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z Normal Distributions (Page 16 of 23) 6.3 Area Under Any Normal Curve TI-84: The area under any normal curve between the values a and b is given by To show all work and receive full credit (a) (20-50% of the credit) Sketch the normal curve and shade in the area in question. Label and scale the axis. Put dots at the transition points and tick marks on the axis to 3 standard deviations on each side of the mean. (b) (50-70% of the credit) Compute the area under the curve, hence the probability, showing the probability notation, the TI- 83 function accessed along with its inputs and output. State probabilities to the nearest ten-thousandth (4 decimal places). Example 7 Given that x has a normal distribution with a mean of 3 and standard deviation of 0.5, find the probability that an x selected at random will be between 2.1 and 3.7. Show your work and include a sketch of the normal curve relevant to this application. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. a b x Area = P(a < x < b) = normalcdf (a, b, µ,σ) QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 17 of 23) Example 8 Let x have a normal distribution with a mean of 10 and a standard deviation of 2. Find the probability that an x selected at random from the distribution is between 11 and 14. Show your work and include a sketch of the normal curve representing the probability. Example A A factory has a machine that puts corn flakes in boxes that are advertised as 20 ounces each. If the distribution of weights is normal with µ = 20 and σ = 1.5, what is the probability that the weight of a randomly selected box of corn flakes will be between 19 and 21 oz? Show your work and include a sketch of the normal curve representing the probability. Guided Exercise 10 If the life of a Sunshine Stereo is normally distributed with a mean of 2.3 years and a standard deviation of 0.4 years, what is the probability that a randomly selected stereo will break down during the warranty period of 2 years? Show your work and include a sketch of the normal curve representing the probability. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 18 of 23) The Inverse-Normal Function: DISTR / 3:invNorm( a = invNorm(Area to the left of a, µ , σ ) Function Access the DISTR menu (2nd > VARS) and select 3:invNorm(area to the left of a, [ µ, σ ]). Input “Area to the left of a” = P(x < a) Mean µ (default value is 0) Standard Deviation σ (default value is 1) Output The value of a on the x-axis of the normal curve. Guided Exercise 11 Find the value of a on the z-axis so that 3% of the area under the standard normal curve lies to the left of a. Round to two decimal places. QuickTime™ and a decompressor are needed to see this picture. x a QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z P(x < a) Normal Distributions (Page 19 of 23) Example B a. Draw a standard normal curve. Then find the value of a > 0 so that 32% of curve lies between 0 and a. b. Draw a standard normal curve. Then find the value(s) of a on the z-axis so that 94% of the curve lies between -a and a. c. Draw a normal curve with a mean of 90 and a standard deviation of 7. Then find the values of b so that 41% of the curve lies between the mean and b. d. Draw a normal curve with a mean of 45 and a standard deviation of 5. Then find the value of b so that 88% of the curve lies between 45 - b and 45 + b. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z QuickTime™ and a decompressor are needed to see this picture. 1 0 −1 −2 −3 2 3 z Normal Distributions (Page 20 of 23) Example C Suppose a distribution is normal with a mean of 44 and a standard deviation of 6. Find the value of a (to the nearest hundredth) on the x-axis so that a. 66% of the data values lie below a. b. 15% of the data values lie above a. Example 9 Magic Video Games Inc. sells expensive computer games and wants to advertise an impressive, full-refund warranty period. It has found that the mean life for its’ computer games is 30 months with a standard deviation of 4 months. If the life spans of the computer games are normally distributed, how long of a warranty period (to the nearest month) can be offered so that the company will not have to refund the price of more than 7% of the computer games? QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 21 of 23) Exercise 39 Suppose you eat lunch at a restaurant that does not take reservations. Let x represent the mean time waiting to be seated. It is known that the mean waiting time is 18 minutes with a standard deviation of 4 minutes, and the x distribution is normal. What is the probability that the waiting time will exceed 20 minutes given that it has exceeded 15 minutes? Let event A = “x > 20 minutes” event B = “x > 15 minutes” Answer by completing the following: a. In terms of events A and B, what is it we want to compute? b. Is the event “B and A” = event “x > 20” (i.e. event A)? c. Show that P(B and A) = P(B) ⋅ P(A, given B) is the same as P(x > 20) = P(x > 15) ⋅ P(A, given B) d. Compute P(A), P(B) and P(A, given B) QuickTime™ and a decompressor are needed to see this picture. Normal Distributions (Page 22 of 23) 6.4 Approximate a Binomial Distribution with a Normal Distribution Fact If np > 5 and nq > 5 in a binomial distribution, then the sample size n is large enough so that the binomial random variable r has a distribution that is approximately normal. The mean and standard deviation of the normal distribution are estimated by µ = np and σ = npq . Furthermore, as the sample size gets larger the approximation gets better. Normal Distributions (Page 23 of 23) Approximating a Binomial Distribution Using a Normal Distribution If np > 5 and nq > 5, then the binomial random variable r has a distribution that is approximately normal. The mean and standard deviation of the normal distribution are estimated by µ = np and σ = npq . Example 12 The owner of a new apartment building needs to have 25 new water heaters installed. Assume the probability that a water heater will last 10 years is 0.25. (a) What is the probability that 8 or more will last at least 10 years. (b) Can the binomial probability distribution be approximated by a normal distribution? Why / Why not? (c) Estimate part (a) with a normal distribution. Take note of the continuity correction. That is, remember to subtract 0.5 from the left endpoint of the interval and add 0.5 to the right endpoint. 420 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. Chapter 7: Introduction to Sampling Distributions Section 7.1 1. Answers vary. Students should identify the individuals (subjects) and variable involved. For example, the population of all ages of all people in Colorado, the population of weights of all students in your school, or the population count of all antelope in Wyoming. 2. Answers vary. A simple random sample of n measurements from a population is a subset of the population selected in a manner such that (a) Every sample of size n from the population has an equal chance of being selected. (b) Every member of the population has an equal chance of being included in the sample. 3. A population parameter is a numerical descriptive measure of a population, such as µ, the population mean, σ, the population standard deviation, σ 2 , the population variance, p, the population proportion, the population maximum and minimum, etc. 4. A sample statistic is a numerical descriptive measure of a sample, such as , x the sample mean, s, the sample standard deviation, s 2 , the sample variance, ˆ , p the sample proportion, the sample maximum and minimum, etc. 5. A statistical inference refers to conclusions about the value of a population parameter based on information from the corresponding sample statistic and the associated probability distributions. We will do both estimation and testing. 6. A sampling distribution is a probability distribution for a sample statistic. 7. They help us visualize the sampling distribution by using tables and graphs that approximately represent the sampling distribution. 8. Relative frequencies can be thought of as a measure or estimate of the likelihood of a certain statistic falling within the class bounds. 9. We studied the sampling distribution of mean trout lengths based on samples of size 5. Other such sampling distributions abound. Notice that the sample size remains the same for each sample in a sampling distribution. Section 7.2 Note: Answers may vary slightly depending on the number of digits carried in the standard deviation. 1. The standard error is simply the standard deviation for a sampling distribution. 2. The standard error 3. x is an unbiased estimator for μ, and ˆ p is an unbiased estimator for p. 4. As the sample size increases, the variability decreases. 5. (a) The required sample size is n ≥ 30. (b) No. If the original distribution is normal, then x is distributed normally for any sample size. 6. (a) No, because we require a sample size of n ≥ 30 if the original distribution is not normal. Part IV: Complete Solutions, Chapter 7 421 Copyright © Houghton Mifflin Company. All rights reserved. (b) Yes, the x distribution is normal with mean 72 x µ = and 8 2 16 x σ = = . (68 73) ( 2 0.50) 0.6687 68 72 73 72 2 0.50 2 2 P x P z z z ≤ ≤ = − ≤ ≤ = − − = = − = = 7. The distribution with n = 225 will have a smaller standard error. Since x n σ σ = , dividing by the square root of 225 will result in a small standard error regardless of the value of σ. 8. (a) We require n = 36 because 12 12 2 6 36 = = . (b) We require n = 144 because 12 12 1 12 144 = = . 9. (a) 15 14 2.0 49 x x n µ µ σ σ = = = = = Because n = 49 ≥ 30, by the central limit theorem, we can assume that the distribution of x is approximately normal. 15 2.0 x x x z µ σ − − = = 15 15 15 converts to 0 2.0 17 15 17 converts to 1 2.0 x z x z − = = = − = = = ( ) ( ) ( ) ( ) 15 17 0 1 1 0 0.8413 0.5000 0.3413 P x P z P z P z ≤ ≤ = ≤ ≤ = ≤ − ≤ = − = (b) 15 14 1.75 64 x x n µ µ σ σ = = = = = Because n = 64 ≥ 30, by the central limit theorem, we can assume that the distribution of x is approximately normal. 15 1.75 x x x z µ σ − − = = 15 15 15 converts to 0 1.75 17 15 17 converts to 1.14 1.75 x z x z − = = = − = = = 422 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ( ) ( ) ( ) 15 17 0 1.14 1.14 0 0.8729 0.5000 0.3729 P x P z P z P z ≤ ≤ = ≤ ≤ = ≤ − ≤ = − = (c) The standard deviation of part (b) is smaller because of the larger sample size. Therefore, the distribution about x µ is narrower. 10. (a) For both distributions, the mean will be 5 x µ = . (b) The distribution with n = 81 because the standard deviation will be smaller. This distribution will be less spread out around its mean. (c) The distribution with n = 81 because the standard deviation will be smaller. This distribution will be less spread out around its mean. 11. (a) 75, 0.8 µ σ = = ( ) ( ) 74.5 75 74.5 0.8 0.63 0.2643 P x P z P z −   < = <     = < − = (b) 0.8 75, 0.179 20 x x n σ µ σ = = = = ( ) ( ) 74.5 75 74.5 0.179 2.79 0.0026 P x P z P z −   < = <     = < − = (c) No. If the weight of only one car were less than 74.5 tons, we could not conclude that the loader is out of adjustment. If the mean weight for a sample of 20 cars were less than 74.5 tons, we would suspect that the loader is malfunctioning because the probability of this event occurring is 0.26% if indeed the distribution is correct. 12. (a) 68, 3 µ σ = = ( ) ( ) ( ) ( ) 67 68 69 68 67 69 3 3 0.33 0.33 0.33 0.33 0.6293 0.3707 0.2586 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = Part IV: Complete Solutions, Chapter 7 423 Copyright © Houghton Mifflin Company. All rights reserved. (b) 3 68, 1 9 x x n σ µ σ = = = = ( ) ( ) ( ) ( ) 67 68 69 68 67 69 1 1 1 1 1 1 0.8413 0.1587 0.6826 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (c) The probability in part (b) is much higher because the standard deviation is smaller for the x distribution. 13. (a) 85, 25 µ σ = = ( ) ( ) 40 85 40 25 1.8 0.0359 P x P z P z −   < = <     = < − = (b) The probability distribution of x is approximately normal with 25 85; 17.68. 2 x x n σ µ σ = = = = ( ) ( ) 40 85 40 17.68 2.55 0.0054 P x P z P z −   < = <     = < − = (c) 25 85, 14.43 3 x x n σ µ σ = = = = ( ) ( ) 40 85 40 14.43 3.12 0.0009 P x P z P z −   < = <     = < − = (d) 25 85, 11.2 5 x x n σ µ σ = = = = ( ) ( ) 40 85 40 11.2 4.02 0.0002 P x P z P z −   < = <     = < − < (e) Yes. The more tests a patient completes, the stronger is the evidence for excess insulin. If the average value based on five tests were less than 40, the patient is almost certain to have excess insulin. 14. 7500, 1750 µ σ = = (a) ( ) ( ) 3500 7500 3500 1750 2.29 0.0110 P x P z P z −   < = <     = < − = 424 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. (b) The probability distribution of x is approximately normal with 1750 7500; 1237.44. 2 x x n σ µ σ = = = = ( ) ( ) 3500 7500 3500 1237.44 3.23 0.0006 P x P z P z −   < = <     = < − = (c) 1750 7500, 1010.36 3 x x n σ µ σ = = = = ( ) ( ) 3500 7500 3500 1010.36 3.96 0.0002 P x P z P z −   < = <     = < − < (d) The probabilities decreased as n increased. It would be an extremely rare event for a person to have two or three tests below 3,500 purely by chance. The person probably has leukopenia. 15. (a) 63.0, 7.1 µ σ = = ( ) ( ) 54 63.0 54 7.1 1.27 0.1020 P x P z P z −   < = <     = < − = (b) The expected number undernourished is 2,200 × 0.1020 = 224.4, or about 224. (c) 7.1 63.0, 1.004 50 x x n σ µ σ = = = = ( ) ( ) 60 63.0 60 1.004 2.99 0.0014 P x P z P z −   < = <     = < − = (d) 63.0, 1.004 x x µ σ = = ( ) ( ) 64.2 63.0 64.2 1.004 1.20 0.8849 P x P z P z −   < = <     = < = Since the sample average is above the mean, it is quite unlikely that the doe population is undernourished. 16. (a) By the central limit theorem, the sampling distribution of x is approximately normal with mean $20 x µ µ = = and standard error $7 $0.70. 100 x n σ σ = = = It is not necessary to make any assumption about the x distribution because n is large. Part IV: Complete Solutions, Chapter 7 425 Copyright © Houghton Mifflin Company. All rights reserved. (b) $20, $0.70 x x µ σ = = ( ) ( ) ( ) ( ) $18 $20 $22 $20 $18 $22 $0.70 $0.70 2.86 2.86 2.86 2.86 0.9979 0.0021 0.9958 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (c) $20, $7 x µ σ = = ( ) ( ) $18 $20 $22 $20 $18 $22 $7 $7 0.29 0.29 0.6141 0.3859 0.2282 P x P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = − = (d) We expect the probability in part (b) to be much higher than the probability in part (c) because the standard deviation is smaller for the x distribution than it is for the x distribution. By the central limit theorem, the sampling distribution of x will be approximately normal as n increases, and its standard deviation n σ will decrease as n increases. For a fixed interval, such as $18 to $22, centered at the mean, $20 in this case, the proportion of the possible x values within the interval will be greater than the proportion of the possible x values within the same interval. A sample of 100 customers contains much more information about purchasing tendencies than a single customer, so averages are much more predictable than a single observation. 17. (a) The random variable x is itself an average based on the number of stocks or bonds in the fund. Since x itself represents a sample mean return based on a large (random) sample of size n = 250 of stocks or bonds, x has a distribution that is approximately normal (central limit theorem). (b) 0.9% 1.6%, 0.367% 6 x x n σ µ σ = = = = ( ) ( ) ( ) ( ) 1% 1.6% 2% 1.6% 1% 2% 0.367% 0.367% 1.63 1.09 1.09 1.63 0.8621 0.0516 0.8105 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (c) Note: 2 years = 24 months; x is monthly percentage return. 0.9% 1.6%, 0.1837% 24 x x n σ µ σ = = = = ( ) ( ) ( ) ( ) 1% 1.6% 2% 1.6% 1% 2% 0.1837% 0.1837% 3.27 2.18 2.18 3.27 0.9854 0.0005 0.9849 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (d) Yes. The probability increases as the standard deviation decreases. The standard deviation decreases as the sample size increases. 426 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. (e) 1.6%, 0.1837% x x µ σ = = ( ) ( ) 1% 1.6% 1% 0.1837% 3.27 0.0005 P x P z P z −   < = <     = < − = This is very unlikely if µ = 1.6%. One would suspect that µ has slipped below 1.6%. 18. (a) The random variable x is itself an average based on the number of stocks in the fund. Since x itself represents a sample mean return based on a large (random) sample of size n = 100 of stocks, x has a distribution that is approximately normal (central limit theorem). (b) 0.8% 1.4%, 0.2667% 9 x x n σ µ σ = = = = ( ) ( ) ( ) ( ) 1% 1.4% 2% 1.4% 1% 2% 0.2667% 0.2667% 1.50 2.25 2.25 1.50 0.9878 0.0668 0.9210 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (c) 0.8% 1.4%, 0.1886% 18 x x n σ µ σ = = = = ( ) ( ) ( ) ( ) 1% 1.4% 2% 1.4% 1% 2% 0.1886% 0.1886% 2.12 3.18 3.18 2.12 0.9993 0.0170 0.9823 P x P z P z P z P z − −   ≤ ≤ = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (d) Yes. The probability increases as the standard deviation decreases. The standard deviation decreases as the sample size increases. (e) 1.4%, 0.1886% x x µ σ = = ( ) ( ) ( ) 2% 1.4% 2% 0.1886% 3.18 1 3.18 1 0.9993 0.0007 P x P z P z P z −   > = >     = > = − ≤ = − = This is very unlikely if µ = 1.4%. One would suspect that the European stock market may be heating up; i.e., µ is greater than 1.4%. 19. (a) The total checkout time for 30 customers is the sum of the checkout times for each individual customer. Thus w = x 1 + x 2 + … + x 30 , and the probability that the total checkout time for the next 30 customers is less than 90 is P(w < 90). Part IV: Complete Solutions, Chapter 7 427 Copyright © Houghton Mifflin Company. All rights reserved. (b) If we divide both sides of w < 90 by 30, we obtain 30 w < 3. However, w is the sum of 30 waiting times, so 30 w is . x Therefore, ( ) ( ) 90 3 . P w P x < = < (c) The probability distribution of x is approximately normal with mean 2.7 x µ µ = = and standard deviation 0.6 0.1095. 30 x n σ σ = = = (d) ( ) ( ) 3 2.7 3 0.1095 2.74 0.9969 P x P z P z −   < = <     = < = The probability that the total checkout time for the next 30 customers is less than 90 minutes is 0.9969. 20. Let w = x 1 + x 2 + ↑ + x 36 . (a) w < 320 is equivalent to 320 36 36 w < or 8.889. x < 2.5 8.5, 0.4167. 36 x x n σ µ µ σ = = = = = ( ) ( ) ( ) 320 8.889 8.889 8.5 0.4167 0.93 0.8238 P w P x P z P z < = < −   = <     = < = (b) w > 275 is equivalent to 275 36 36 w > or 7.639. x > 8.5, 0.4167. x x µ σ = = ( ) ( ) ( ) ( ) 275 7.639 7.639 8.5 0.4167 2.07 1 2.07 1 0.0192 0.9808 P w P x P z P z P z > = > −   = >     = > − = − ≤ − = − ≈ (c) ( ) ( ) ( ) ( ) ( ) 275 320 7.639 8.889 2.07 0.93 0.93 2.07 0.8238 0.0192 0.8046 P w P x P z P z P z < < = < < = − < < = < − < − = − = 428 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. 21. (a) Let 1 2 5 . w x x x = + + +  3.3 17, 1.476 5 x x n σ µ µ σ = = = = = 90 ( 90) 5 5 ( 18) 18 17 1.476 ( 0.68) 1 0.7517 0.2483 w P w P P x P z P z   > = >     = > −   = >     = > = − = (b) 80 ( 80) 5 5 ( 16) 16 17 1.476 ( 0.68) 0.2483 w P w P P x P z P z   < = <     = < −   = <     = < − = (c) (80 90) (16 18) ( 0.68 0.68) ( 0.68) ( 0.68) 0.7517 0.2483 0.5034 P w P x P z P z P z < < = < < = − < < = < − < − = − = Section 7.3 1. We must check that np > 5 and nq > 5. 2. ˆ p pq n σ = and ˆ p p µ = 3. Yes, it is unbiased. The mean of the distribution for ˆ p is p. 4. (a) 0.5/25 = 0.02 (b) 0.5/100 = 0.005 (c) As n increases, the continuity correction decreases. 5. (a) ( ) ( ) 33 0.21 6.93, 33 0.79 26.07 np nq = = = = Yes, ˆ p can be approximated by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.21 0.79 0.21, 0.071 33 p p p µ σ = = = ≈ Part IV: Complete Solutions, Chapter 7 429 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ( ) ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.015 33 ˆ 0.15 0.25 0.15 0.015 0.25 0.015 0.135 0.265 0.135 0.21 0.265 0.21 0.071 0.071 1.06 0.77 0.77 1.06 0.7794 0.1446 0.6348 n P p P x P x P z P z P z P z = = ≈ ≤ ≤ = − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (b) No, because np = 25 × 0.15 = 3.75 does not exceed 5. (c) ( ) ( ) = 48 0.15 = 7.2, = 48 0.85 = 40.8 np nq Yes, ˆ p can be approximated by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.15 0.85 0.15, 0.052 48 p p p µ σ = = = ≈ ( ) ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.010 45 ˆ 0.22 0.22 0.010 0.21 0.21 0.15 0.052 1.15 1 1.15 1 0.8749 0.1251 n P p P x P x P z P z P z = = = ≥ = ≥ − = ≥ −   = ≥     = ≥ = − < = − = 6. (a) ( ) ( ) 50, 0.36 50 0.36 18, 50 0.64 32 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.36 0.64 0.36, 0.068 50 p p p µ σ = = = ≈ 430 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ( ) ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.01 50 ˆ 0.30 0.45 0.30 0.01 0.45 0.01 0.29 0.46 0.29 0.36 0.46 0.36 0.068 0.068 1.03 1.47 1.47 1.03 0.9292 0.1515 0.7777 n P p P x P x P z P z P z P z = = = ≤ ≤ ≈ − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (b) ( ) ( ) 38, 0.25 38 0.25 9.5, 38 0.75 28.5 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.25 0.75 0.25, 0.070 38 p p p µ σ = = = ≈ ( ) ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.013 38 ˆ 0.35 0.35 0.013 0.337 0.337 0.25 0.070 1.24 1 1.24 1 0.8925 0.1075 n P p P x P x P z P z P z = = = > = > − = > −   = >     = > = − ≤ = − = (c) ( ) 41, 0.09 41 0.09 3.69 n p np = = = = We cannot approximate ˆ p by a normal random variable because np < 5. 7. ( ) ( ) 30, 0.60 30 0.60 18, 30 0.40 12 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.6 0.4 0.6, 0.089 30 0.5 0.5 Continuity correction 0.017 30 p p p n µ σ = = = ≈ = = = Part IV: Complete Solutions, Chapter 7 431 Copyright © Houghton Mifflin Company. All rights reserved. (a) ( ) ( ) ( ) ( ) ˆ 0.5 0.5 0.017 0.483 0.483 0.6 0.089 1.31 0.9049 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ − = (b) ( ) ( ) ( ) ( ) ˆ 0.667 0.667 0.017 0.65 0.65 0.6 0.089 0.56 0.2877 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ = (c) ( ) ( ) ( ) ( ) ˆ 0.333 0.333 0.017 0.35 0.35 0.6 0.089 2.81 0.0025 P p P x P x P z P z ≤ ≈ ≤ + = ≤ −   = ≤     = ≤ − = (d) Yes, both np and nq exceed 5. 8. (a) ( ) ( ) 38, 0.73 38 0.73 27.74, 38 0.27 10.26 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.73 0.27 0.73, 0.072 38 0.5 0.5 Continuity correction 0.013 38 p p p n µ σ = = = ≈ = = = ( ) ( ) ( ) ( ) ˆ 0.667 0.667 0.013 0.654 0.654 0.73 0.072 1.06 0.8554 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ − = (b) ( ) ( ) 45, 0.86 45 0.86 38.7, 45 0.14 6.3 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. 432 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ˆ ˆ 0.86 0.14 0.86, 0.052 45 0.5 0.5 Continuity correction 0.011 45 p p p n µ σ = = = ≈ = = = ( ) ( ) ( ) ( ) ˆ 0.667 0.667 0.011 0.656 0.656 0.86 0.052 3.92 1 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ − ≈ (c) Yes, both np and nq exceed 5 for men and for women. 9. (a) ( ) ( ) 100, 0.06 100 0.06 6, 100 0.94 94 n p np nq = = = = = = ˆ p can be approximated by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.06 0.94 0.06, 0.024 100 0.5 Continuity correction 0.005 100 p p p µ σ = = = ≈ = = (b) ( ) ( ) ( ) ( ) ˆ 0.07 0.07 0.005 0.065 0.065 0.06 0.024 0.21 0.4168 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ = (c) ( ) ( ) ( ) ( ) ˆ 0.11 0.11 0.005 0.105 0.105 0.06 0.024 1.88 0.0301 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ = Yes; because this probability is so small, it should rarely occur. The machine might need an adjustment. 10. (a) ( ) ( ) 50, 0.565 50 0.565 28.25, 50 0.435 21.75 n p np nq = = = = = = ˆ p can be approximated by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.565 0.435 0.565, 0.070 50 0.5 0.5 Continuity correction 0.01 50 p p p n µ σ = = = ≈ = = = Part IV: Complete Solutions, Chapter 7 433 Copyright © Houghton Mifflin Company. All rights reserved. (b) ( ) ( ) ( ) ( ) ˆ 0.53 0.53 0.01 0.54 0.54 0.565 0.070 0.36 0.3594 P p P x P x P z P z ≤ ≈ ≤ + = ≤ −   = ≤     = ≤ − = (c) ( ) ( ) ( ) ( ) ˆ 0.41 0.41 0.01 0.42 0.42 0.565 0.070 2.07 0.0192 P p P x P x P z P z ≤ ≈ ≤ + = ≤ −   = ≤     = ≤ − = (d) Meredith has the more serious case because the probability of having such a low reading in a healthy person is less than 2%. 11. ( ) total number of successes from all 12 quarters total number of families from all 12 quarters 11 14 19 12 92 206 1,104 0.1866 p = + + + = = =  ( ) ˆ ˆ 1 1 0.1866 0.8134 0.1866 0.1866 0.8134 0.0406 92 p p q p p p pq pq n n µ σ = − = − = = ≈ = = ≈ = ≈ Check: ( ) ( ) 92 0.1866 17.2, 92 0.8134 74.8 np nq = = = = Since both np and nq exceed 5, the normal approximation should be reasonably good. Center line 0.1866 p = = Control limits at 2 pq p n ± ( ) 0.1866 2 0.0406 0.1866 0.0812 or 0.1054 and 0.2678 = ± = ± Control limits at 3 pq p n ± ( ) 0.1866 3 0.0406 0.1866 0.1218 or 0.0648 and 0.3084 = ± = ± 434 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. Quar t er P r o p o r t i o n 12 11 10 9 8 7 6 5 4 3 2 1 0.30 0.25 0.20 0.15 0.10 0.05 _ P= 0.1866 +3SL= 0.3084 -3SL= 0.0647 +2SL= 0.2678 -2SL= 0.1054 P Char t of Vi ct i ms There are no out-of-control signals. 12. ( ) total number of defective cans total number of cans 8 11 10 110 15 133 1650 0.08061 p = + + + = = =  ( ) ( ) ˆ ˆ 1 1 0.08061 0.91939 0.08061 0.08061 0.91939 0.02596 110 p p q p p p pq pq n n µ σ = − = − = = ≈ = = ≈ = ≈ Check: ( ) ( ) 110 0.08061 8.9, 110 0.91939 101.1 np nq = = = = Since both np and nq exceed 5, the normal approximation should be reasonably good. Center line 0.08061 p = = Control limits at 2 pq p n ± ( ) 0.08061 2 0.02596 0.08061 0.05192, or 0.02869 and 0.1325. = ± = ± Control limits at 3 pq p n ± ( ) 0.08061 3 0.02596 0.08061 0.07788, or 0.00273 and 0.1585. = ± = ± Part IV: Complete Solutions, Chapter 7 435 Copyright © Houghton Mifflin Company. All rights reserved. Test Sheet P r o p o r t i o n 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 _ P= 0.0806 +3SL= 0.1585 -3SL= 0.0027 +2SL= 0.1325 -2SL= 0.0287 P Char t of Def ect i ve Cans There are no out-of-control signals. It appears that the production process is in reasonable control. 13. ( ) total number who got jobs total number of people 60 53 58 75 15 872 1125 0.7751 p = + + + = = =  ( ) ( ) ˆ ˆ 1 1 0.7751 0.2249 0.7751 0.7751 0.2249 0.0482 75 p p q p p p pq pq n n µ σ = − = − = = ≈ = = ≈ = ≈ Check: ( ) ( ) 75 0.7751 58.1, 75 0.2249 16.9 np nq = = = = Since both np and nq exceed 5, the normal approximation should be reasonably good. Center line 0.7751 p = = Control limits at 2 pq p n ± ( ) 0.7751 2 0.0482 0.7751 0.0964, or 0.6787 to 0.8715. = ± = ± Control limits at 3 pq p n ± ( ) 0.7751 3 0.0482 0.7751 0.1446, or 0.6305 to 0.9197. = ± = ± 436 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. Day P r o p o r t i o n 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 _ P= 0.7751 +3SL= 0.9197 -3SL= 0.6305 +2SL= 0.8715 -2SL= 0.6787 1 1 P Char t of Jobs Out-of-control signal III occurs on days 4 and 5; out-of-control signal I occurs on day 11 on the low side and day 14 on the high side. Out-of-control signals on the low side are of most concern for the homeless seeking work. The foundation should look to see what happened on that day. The foundation might take a look at the out-of-control periods on the high side to see if there is a possibility of cultivating more jobs. Chapter 7 Review 1. (a) The x distribution approaches a normal distribution. (b) The mean x µ of the x distribution equals the mean µ of the x distribution regardless of the sample size. (c) The standard deviation x σ of the sampling distribution equals , where n σ σ is the standard deviation of the x distribution and n is the sample size. (d) They will both be approximately normal with the same mean, but the standard deviations will be 50 σ and , 100 σ respectively. 2. All the x distributions will be normal with mean 15. x µ µ = = The standard deviations will be 3 3 4: 2 4 3 3 16: = 4 16 3 3 100: = 10 100 x x x n n n n n n σ σ σ σ σ σ = = = = = = = = = = 3. (a) 35, 7 µ σ = = ( ) ( ) 40 35 40 7 0.71 0.2389 P x P z P z −   ≥ = ≥     = ≥ = (b) 7 7 35, 3 9 x x n σ µ µ σ = = = = = Part IV: Complete Solutions, Chapter 7 437 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ( ) 7 3 40 35 40 2.14 0.0162 P x P z P z −   ≥ = ≥       = ≥ = 4. (a) 38, 5 µ σ = = ( ) ( ) 35 38 35 5 0.6 0.2743 P x P z P z −   ≤ = ≤     = ≤ − = (b) 5 38, 1.58 10 x x n σ µ µ σ = = = = = ( ) ( ) 35 38 35 1.58 1.90 0.0287 P x P z P z −   ≤ = ≤     = ≤ − = (c) The probability in part (b) is much smaller because the standard deviation is smaller for the x distribution. 5. 15 100, 1.5 100 x x n σ µ µ σ = = = = = ( ) ( ) ( ) ( ) ( ) 100 2 100 2 98 102 98 100 102 100 1.5 1.5 1.33 1.33 1.33 1.33 0.9082 0.0918 0.8164 P x P x P z P z P z P z − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = 6. 2 15, 0.333 36 x x n σ µ µ σ = = = = = ( ) ( ) ( ) ( ) ( ) 15 0.5 15 0.5 14.5 15.5 14.5 15 15.5 15 0.333 0.333 1.5 1.5 1.5 1.5 0.9332 0.0668 0.8664 P x P x P z P z P z P z − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = 7. (a) ( ) ( ) 50, 0.22 50 0.22 11, 50 0.78 39 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. 438 Part IV: Complete Solutions, Chapter 7 Copyright © Houghton Mifflin Company. All rights reserved. ( ) ˆ ˆ 0.22 0.78 0.22, 0.0586 50 p p p µ σ = = = ≈ ( ) ( ) ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.01 50 ˆ 0.20 0.25 0.20 0.01 0.25 0.01 0.19 0.26 0.19 0.22 0.26 0.22 0.0586 0.0586 0.51 0.68 0.68 0.51 0.7517 0.3050 0.4467 n P p P x P x P z P z P z P z = = = ≤ ≤ ≈ − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (b) ( ) ( ) 38, 0.27 38 0.27 10.26, 38 0.73 27.74 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ( ) ˆ ˆ 0.27 0.73 0.27, 0.0720 38 p p p µ σ = = = ≈ ( ) ( ) ( ) ( ) 0.5 0.5 Continuity correction 0.013 38 ˆ 0.35 0.35 0.013 0.337 0.337 0.27 0.0720 0.93 0.1762 n P p P x P x P z P z = = = ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ = (c) ( ) 51, 0.05 51 0.05 2.55 n p np = = = = No, we cannot approximate ˆ p by a normal random variable because np < 5. 8. ( ) ( ) 28, 0.31 28 0.31 8.68, 28 0.69 19.32 n p np nq = = = = = = Approximate ˆ p by a normal random variable because both np and nq exceed 5. ( ) ˆ ˆ 0.31 0.69 0.31, 0.087 28 0.5 0.5 Continuity correction 0.018 28 p p p n µ σ = = = ≈ = = = Part IV: Complete Solutions, Chapter 7 439 Copyright © Houghton Mifflin Company. All rights reserved. (a) ( ) ( ) ( ) ( ) ˆ 0.25 0.25 0.018 0.232 0.232 0.31 0.087 0.90 0.8159 P p P x P x P z P z ≥ ≈ ≥ − = ≥ −   = ≥     = ≥ − = (b) ( ) ( ) ( ) ( ) ( ) ( ) ˆ 0.25 0.50 0.25 0.018 0.50 0.018 0.232 0.518 0.232 0.31 0.518 0.31 0.087 0.087 0.90 2.39 2.39 0.90 0.9916 0.1841 0.8075 P p P x P x P z P z P z P z ≤ ≤ ≈ − ≤ ≤ + = ≤ ≤ − −   = ≤ ≤     = − ≤ ≤ = ≤ − ≤ − = − = (c) Yes, both np and nq exceed 5. 8.1 Estimating µ When σ is Known (Page 1 of 25) 8.1 Estimating µ When σ is Known Assumptions about the random variable x 1. We have a simple random sample of size n drawn from the population of x values. 2. The value of σ , the population standard deviation is known. 3. If the x distribution is normal, then our methods work for any sample size n. 4. If x has an unknown distribution, then we require the sample size n ≥ 30. However, if the x distribution is not mound- shaped, then a sample size of 50 or 100 may be needed. Point Estimate An estimate of a population given by a single number is called a point estimate of that population parameter. For Example: x is a point estimate for µ. s is a point estimate for σ . Margin of Error The margin of error in using x as a point estimate for µ is given by E = x − µ . A point estimate is not very useful unless we have some kind of measure of how “good” it is. This “measure of goodness” is expressed as a confidence interval. 8.1 Estimating µ When σ is Known (Page 2 of 25) Confidence Interval and Level of Confidence Suppose 100 students at Palomar were randomly chosen and their heights were measured yielding a [sample] mean of 5.72 ft with a margin of error of 0.08 ft. Consider the following statements: 1. The population mean is approximately 5.72 feet. 2. There is a 95% probability that the population mean is between 5.64 ft and 5.80 ft. P(5.64 ft ≤ µ ≤ 5.80 ft ) = 0.95 3. At a 95% level of confidence the population mean is between 5.64 ft and 5.80 ft. 4. The population mean µ = 5.72 ± 0.08 feet at a 95% level of confidence. Confidence levels and confidence intervals provide a measure of how “good” a point estimate estimates a population parameter. 8.1 Estimating µ When σ is Known (Page 3 of 25) Confidence Interval for µ A c-percent confidence interval for the population mean µ is an interval computed from sample data in such a way that c is the probability of generating an interval containing the actual value of µ. That is, P(x − E ≤ µ ≤ x + E) = c, where E is the maximum margin of error when estimating µ with x . In words, P(x − E ≤ µ ≤ x + E) = c means . . . 1. The probability that the population mean µ is between x − E and x + E is c. 2. The population mean µ is between x − E and x + E at a confidence level of c. 3. The population mean is µ ± E at a c-percent level of confidence. 5. If we repeat the experiment many times with the same sample size, then c proportion of the intervals calculated will contain the population mean µ . Thus, 1− c proportion of the intervals will not contain µ. x + E x − E Shaded Area = c x -axis The probability that µ is on this interval is c 8.1 Estimating µ When σ is Known (Page 4 of 25) Example 1 Jackie has been jogging 2 miles a day for years and she records her times. A sample of 90 of these times has a mean of 15.60 minutes and a known standard deviation of 1.80 minutes. a. Find a 95% confidence interval for µ. Draw and label the normal distribution illustrating the confidence interval. Solve without using the ZInterval function (see below). b. Find E, maximum error in estimating µ with x at the confidence level c. c. Write the conclusion in probability notation. i.e. P(x − E ≤ µ ≤ x + E) = c d. Summarize your conclusion in one sentence [relvant to the application]. 8.1 Estimating µ When σ is Known (Page 5 of 25) Using the TI-83/84 ZInterval Function The ZInterval (STAT / TESTS / 7: ZInterval) function computes a confidence interval for and unknown population mean µ when the population standard deviation is known. Input: STATS σ, x , and c-level Output: The interval from x − E to x + E, where E = 1 2 (interval length) Example 2 a. Compute the 95% confidence interval in example 1. Use the ZInterval function. b. Summarize your results in a complete sentence relevant to this application. At a 95% level of confidence, the population mean µ of all 2- mi jogging times for Jackie is between 15.23 and 15.97 minutes. 8.1 Estimating µ When σ is Known (Page 6 of 25) Section 8.1 Homework Instructions Steps to find a c% confidence interval for µ 1. Sketch a normal curve illustrating the c% confidence interval for µ . Label x − E, x + E, and x . Where E is the margin of error when estimating µ with x at a confidence level of c. 2. Without using the ZInterval function, compute the c% confidence interval for the population mean. That is, x − E = invNorm(area to the left of x − E, µ x , σ x ) x + E = invNorm(area to the left of x + E, µ x , σ x ) Use the estimate x ≈ µ x , and σ x = σ / n . 3. Find E, is the maximum error in estimating µ with x at a confidence level of c. It is computed as follows E = “half the interval length” from step 2 E = 1 2 (x + E) − (x − E) [ ] 4. Write the confidence interval in probability notation. i.e. P(x − E ≤ µ ≤ x + E) = c 5. Summarize your results in a concise, complete sentence relevant to the problem. That is, At the c% confidence level the population mean µ of all ____________________ is between _____ and _____ [units]. x + E x − E x -axis 8.1 Estimating µ When σ is Known (Page 7 of 25) Guided Exercise 2 Jason jogs 3 miles per day and records his times. A sample of 90 of these times has a mean of 21.50 minutes and a known standard deviation of 2.11 minutes. Find the 99% confidence interval for the population mean by completing steps 1-5 above. a. Sketch a normal curve to illustrate the 99% confidence interval for the mean in his application. Label the axis. b. Without using the ZInterval function, find the 99% confidence interval for the population mean. c. Find E = “half the interval length” d. Write the confidence interval in probability notation. i.e. P(x − E ≤ µ ≤ x + E) = c e. Summarize your results in a concise, complete sentence relevant to the problem. At a ____% confidence level, the mean of _______________ _________________________________________________ is between ____________ and _____________. 8.1 Estimating µ When σ is Known (Page 8 of 25) Guided Exercise 3 An automobile loan company wants to estimate the amount of the average car loan during the past year. A random sample of 200 loans had a mean of $8225 and a known standard deviation of $762. Find the 95% confidence interval for the population mean by completing steps 1-5 above. a. Sketch a normal curve to illustrate the 95% confidence interval for the mean in his application. Label the axis. b. Without using the ZInterval function, find the 95% confidence interval for the population mean. c. Find E = “half the interval length” d. Write the confidence interval in probability notation. i.e. P(x − E ≤ µ ≤ x + E) = c e. Summarize your results in a concise, complete sentence relevant to the problem. At a ____% confidence level, the mean of _______________ _________________________________________________ is between ____________ and _____________. 8.1 Estimating µ When σ is Known (Page 9 of 25) 8.1 (was 8.4) Estimating the Sample Size n Critical Value z c is called the critical value for a confidence level c if P(−z c < z < z c ) = c That is, z c is the z-score such that the area under the standard normal curve between −z c and z c is c. In words we say . . . a. “the probability that a randomly selected z-value is between −z c and z c is c.” Or b. “at a c-percent level of confidence we can say that a randomly chosen z will be between −z c and z c .” For Example 1. If c = 0.90, then P(−z 0.90 < z < z 0.90 ) = 0.90. Compute z 0.90 . 2. If c = 0.95, then P(−z 0.95 < z < z 0.95 ) = 0.95. Compute z 0.95 . 3. If c = 0.99, then P(−z 0.99 < z < z 0.99 ) = 0.99. Compute z 0.99 . Estimating Sample Size n for Estimating µ z c −z c Shaded Area = c z-axis 8.1 Estimating µ When σ is Known (Page 10 of 25) If, with a confidence level of c, we want our point estimate x to be within E units of µ, then we choose the sample size n to be n = z c ⋅σ E       2 , where z c is the critical value for a confidence level of c. Example 6 A sample of 50 salmon is caught and weighed. The sample standard deviation of the 50 weights is 2.15 lb. How large of a sample should be taken to be 97% confident that the sample mean is within 0.20 lb of the mean weight of the population? Find z c (to the nearest thousandth) and n. Then summarize your results in a complete sentence relevant to this application. 8.1 Estimating µ When σ is Known (Page 11 of 25) Example 7 An efficiency expert wants to determine the mean time it takes an employee to assemble a switch on an assembly line. A preliminary study of 45 observations found a sample standard deviation of 78 seconds. How many more observations are needed to be 92% certain that the mean of the sample will vary from the true mean by no more than 15 seconds? Find z c (to the nearest thousandth) and n. Then summarize your results in a complete sentence relevant to this application. Guided Exercise 6 The dean wants to estimate the average teaching experience (in years) of the faculty members. A preliminary random sample of 60 faculty yields a sample standard deviation of 3.4 years. How many more faculty should be sampled to be 99% confident that the sample mean does not differ from the true mean by more than 0.5 years? Find z c (to the nearest thousandth) and n. Then summarize your results in a complete sentence relevant to this application. 8.2 Estimating µ When σ is Unknown (Page 12 of 25) 8.2 Estimating µ When σ is Unknown When the population standard deviation σ is unknown, it is approximated by the sample standard deviation s. The TInterval function works with what is called the Student’s t-distribution where all statistical “fudge factors” necessary to accommodate approximating σ with s are built into the function. The TInterval function (TI-83: STAT / TESTS / 8: TInterval) Input: STATS x, s, n, and c-level or DATA data list and c-level Output: The interval from x − E to x + E, where E = 1 2 (interval length) Homework Instructions for Section 8.2 1. Omit exercises #1-4 2. When asked to find a confidence interval, do the following: a. Find the c% confidence interval for the mean µ. Write it in probability notation b. Summarize your results in a complete sentence relevant to the application. 8.2 Estimating µ When σ is Unknown (Page 13 of 25) Example 4 An archeologist discovered a new, but extinct, species of miniature horse. The only seven known samples show shoulder heights (in cm) of 45.3, 47.1, 44.2, 46.8, 46.5, 45.5, and 47.6. Find the 99% confidence interval for µ (the mean height of the entire population of ancient horses) and the error E. Then summarize your results in a complete sentence relevant to this application. a. Find the 99% confidence interval for the mean µ . Write it in probability notation b. Summarize your results in a complete sentence relevant to the application. Guided Exercise 3 A company produced a trial production run of 37 artificial sapphires. The mean weight is 6.75 carats and the standard deviation is 0.33 carats. Find the 95% confidence interval for the mean weight µ of all artificial sapphires and the error E. Then summarize your results in a complete sentence relevant to this application. 8.3 Estimating p in a Binomial Experiment (Page 14 of 25) 8.3 Estimating p in a Binomial Experiment Large Sample Size Assumption If np > 5 and nq > 5, then the sample size n is large enough so that the binomial distribution can be approximated by a normal distribution, and a c% confidence interval for p is expressed as P( ˆ p − E ≤ p ≤ ˆ p +E) = c where ö p is the point estimate for p. TI-83 1-PropZInt function: STAT / TESTS / A: 1-PropZInt Input: x = r = number of successes n = number of trials c-level = confidence level Output: ( ˆ p −E, ˆ p +E), ˆ p , n Where E (the maximum error in using ö p as a point estimate for p for the given confidence level) is one-half the interval length. 8.3 Estimating p in a Binomial Experiment (Page 15 of 25) Example 5 Suppose 800 students were given flu shots and 600 did not get the flu. Assuming all 800 were exposed to the flu: a. What is S, n, and r (note: r is input as variable x on the TI-83)? b. What are the point estimates for p and q (i.e. ö p and ö q)? c. Is n large enough to approximate the binomial distribution with a normal distribution? Why? d. Find the 99% confidence interval for p. e. Summarize your results in a complete sentence relevant to this application. ˆ p = ˆ q = S = r = n = n ˆ p = n ˆ q = P( ˆ p − E ≤ p ≤ ˆ p + E) = 0.99 8.3 Estimating p in a Binomial Experiment (Page 16 of 25) Guided Exercise 4 A random sample of 195 books at a bookstore showed that 68 of the books were nonfiction. a. Find S and ö p. b. Is the sample size large enough to approximate a normal distribution with a binomial distribution? Why? c. Find the 90% confidence interval for p to the nearest thousandth (3 decimal places). d. Summarize your results in a complete sentence relevant to this application. Homework Instructions for Section 8.3 Problems When asked to find the c% confidence interval for p, do the following four steps. 1. Find S and ö p 2. Determine if the sample size is large enough to approximate a normal distribution with a binomial distribution? 3. Find the c% confidence interval for p to the nearest thousandth (3 decimal places). 4. Summarize your results in a complete sentence relevant to this application. 8.3 Estimating p in a Binomial Experiment (Page 17 of 25) A Margin of Error, E, is the maximum error when using a point estimate for a population parameter at a given confidence level. General Interpretation of Poll Results 1. When a poll states the results of a survey, the proportion reported is ö p (the sample estimate of the population proportion). 2. The margin of error is the maximal error E of a [95%, usually] confidence interval for p. 3. If ö p is obtained from a poll, Then a 95% confidence interval for the population proportion p is ö p − E < p < ö p + E. Guided Exercise 5 A random sample of 315 households were surveyed. Chances are 19 of 20 that if all adults had been surveyed, the findings would differ from the poll results by no more than 2.6% in either direction. One question was asked: “Which party would do a better job handling education?” The possible responses were Democrats, Republicans, neither, or both. The poll reported that 32% responded Democrat. a. What confidence level corresponds to the phrase “chances are 19 of 20 that if . . . .” b. What is S, n, and the sample statistic ö p for the proportion responding Democrat? c. Find E. Find the 95% confidence interval for p those who would respond Democrat. d. Summarize your results in a complete sentence relevant to this application. 8.3 Estimating p in a Binomial Experiment (Page 18 of 25) 8.3 Estimating Sample Size n for Estimating p (a) If, with a confidence level of c, we want our point estimate ö p to be within E units of p, then we choose the sample size n to be n = ö p⋅ ö q⋅ z c E       2 where z c is the z-score corresponding to a confidence level of c. (b) If no estimate for p is available, we can say with a confidence level of at least c that the point estimate ö p will be within E units of p by choosing n = 0.25 z c E       2 Example 8 A buyer for a popcorn company wants to estimate the probability p that a kernel purchased from a particular farm will pop. Suppose a random sample of n kernels is taken and r of these kernels pop. The buyer wants to be 95% certain that the point estimate ö p will be within 0.01 units of p. a. Find z c and E. b. If no estimate for p is available, how large a sample should the buyer use? (i.e. how large should n be)? c. A preliminary study showed that p was approximately 0.86. Now, how large a sample should be used? 8.3 Estimating p in a Binomial Experiment (Page 19 of 25) Guided Exercise 7 The health department wants to estimate the proportion of children who require corrective lenses for their vision. They want to be 99% sure that the point estimate for p will have a maximum error of 0.03. a. If no other information is known, find E and z c . Estimate the sample size required. b. Suppose a preliminary random sample of 100 children indicates that 23 require corrective lenses. How large should n be? 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 20 of 25) 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 Independent and Dependent Samples In order to make a statistical estimate about the difference between two populations, we need to have a sample from each population. Two samples are independent if the sample from one population is unrelated to the sample from the other. However, if each measurement in one sample can be naturally paired with measurements of another sample, the two samples are said to be dependent (such as before and after samples). Guided Exercise 8 Classify the pairs of samples as dependent or independent. a. In a medical experiment, one group is given a treatment and another group is given a placebo. After a period of time both groups are measured for the same condition. b. A group of Math students is given a test at the beginning of a course and the same group is given the same test at the end of the course. 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 21 of 25) Theorem 8.1 Let x 1 and x 2 have normal distributions. If we take independent random samples of size n 1 from x 1 and n 2 from x 2 , then the variable x 1 − x 2 has 1. a normal distribution 2. a mean of µ 1 − µ 2 3. a standard deviation of σ 1 2 n 1 + σ 2 2 n 2 Estimating µ 1 − µ 2 When σ 1 and σ 1 are Known A c% confidence interval for µ 1 − µ 2 is expressed as (x 1 − x 2 ) −E < µ 1 −µ 2 < ( x 1 − x 2 ) +E This interval is the output of the TI-83 function 2-SampZInt. TI-83 function 2-SampZInt (STAT / TESTS / 9: 2-SampZInt) Input: σ 1 , σ 2 , x 1, n 1 , x 2 , n 2 , c −level Output: Interval from (x 1 − x 2 ) −E to (x 1 − x 2 ) + E Where E is one half the interval length output by the 2-SampZInt function. 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 22 of 25) Example 9 Suppose a biologist is studying data from Yellowstone streams before and after a 1988 fire. A random sample of 167 fishing reports in the years before the fire showed the average catch per day of 5.2 trout with σ = 1.9 trout. After the fire a sample of 125 fishing reports showed the average catch per day of 6.8 trout with σ = 2.3 trout. a. Are the sample independent? b. Compute a 95% C.I. for µ 1 − µ 2 . At a 95% level of confidence ________ < µ 1 − µ 2 < _______. c. Explain the meaning of part b. Estimating µ 1 − µ 2 When σ 1 and σ 1 are Unknown A c% confidence interval for µ 1 − µ 2 is expressed as (x 1 − x 2 ) −E < µ 1 −µ 2 < ( x 1 − x 2 ) +E This interval is the output of the TI-83 function 2-SampTInt. TI-83 function 2-SampTInt (STAT / TESTS / 0: 2-SampTInt) Input: x 1 , s 1 , n 1 , x 2 , s 2 , n 2 , c-level, pooled: yes Output: Interval from (x 1 − x 2 ) −E to (x 1 − x 2 ) + E Where E is one half the interval length output by the 2-SampTInt function. 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 23 of 25) Example 10 Suppose that a random sample of 29 college students was divided into two groups. The first group had 15 people and was given 1/2 liter of red wine before going to sleep. The second group of 14 people was not given alcohol before going to sleep. Both groups went to sleep at 11 p.m. The average brain wave activity (in hertz) between 4 and 6 a.m. was measured for each participant. The results follow: Group 1 16.0 19.6 19.9 20.9 20.3 20.1 16.4 20.6 20.1 22.3 18.8 19.1 17.4 21.1 22.1 x 1 = 19.65 hz , s 1 = 1.86 hz Group 2 8.2 5.4 6.8 6.5 4.7 5.9 2.9 7.6 10.2 6.4 8.8 5.4 8.3 5.1 x 2 = 6.59 hz , s 2 = 1.91 hz a. Are the groups independent? b. Compute the 90% C.I. for µ 1 − µ 2 and write it in probability notation. c. Summarize the results of part b in a single sentence relevant to this application. 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 24 of 25) Guided Exercise 9 a. A study reported a 90% confidence interval for the difference of the means to be 10 < µ 1 − µ 2 < 20. What can you conclude about the values of µ 1 and µ 2 . b. A study reported a 95% confidence interval for the difference of proportions to be −0.32 < p 1 − p 2 < 0.16. What can you conclude about the values of p 1 and p 2 . 8.4 Estimating µ 1 − µ 2 and p 1 − p 2 (Page 25 of 25) Confidence Interval for p 1 − p 2 (Large Samples) If n 1 ö p 1 > 5, n 1 ö q 1 > 5, n 2 ö p 2 > 5 and n 2 ö q 2 > 5, then the c% confidence interval for p 1 − p 2 is expressed as ( ö p 1 − ö p 2 ) − E < p 1 − p 2 < ( ö p 1 − ö p 2 ) + E where E is the maximum error in using ö p 1 − ö p 2 as an estimate for p 1 − p 2 at a c% confidence level. TI-83 function 2-PropZInt (STAT / TESTS / B: 2-PropZInt) Input: r 1 = x 1 , n 1 , r 2 = x 2 , n 2 , c-level Output: Interval from ( ö p 1 − ö p 2 ) − E to ( ö p 1 − ö p 2 ) + E Where E is one half the interval length. Exercise 14 The burn center at Community hospital is experimenting with a new plasma compress treatment. A random sample of 316 patients with minor burns received the plasma compress treatment. Of these patients, 259 had no visible scars after treatment. Another random sample of 419 patients with minor burns received no plasma compress treatment. Of this group, 94 had no visible scars. Let p 1 be the proportion of patients who received the plasma compress treatment and had no visible scars after treatment. Let p 2 be the proportion of patients who did not receive the plasma compress treatment but still had no visible scars. a. Find the 95% confidence interval for p 1 − p 2 . b. Summarize the results in a single sentence relevant to this application. Page 1 of 34 9.1 Hypothesis Testing a. A statistical hypothesis, or simply a hypothesis, is an assumption about a population parameter. b. Hypothesis testing is the procedure whereby we decide to “reject” or “fail to reject” a hypothesis. c. Null hypothesis H 0 : This is the hypothesis (assumption) under investigation or the statement being tested. The null hypothesis is a statement that “there is no effect,” “there is no difference,” or “there is no change.” The possible outcomes in testing a null hypothesis are ‘reject’ or ‘fail to reject.’ d. Alternate hypothesis H 1 : This is a statement you will adopt if there is strong evidence (sample data) against the null hypothesis. A statistical test is designed to assess the strength of the evidence (data) against the null hypothesis. e. Fail to Reject H 0 : We never say we “accept H 0 ” - we can only say we “fail to reject” it. Failing to reject H 0 means there is NOT enough evidence in the data and in the test to justify rejecting H 0 . So, we retain the H 0 knowing we have not proven it true beyond all doubt. f. Rejecting H 0 : This means there IS significant evidence in the data and in the test to justify rejecting H 0 . When H 0 is rejected the data is said to be statistically significant. We adopt H 1 knowing we will occasionally be wrong. Page 2 of 34 Example 1 A car manufacturer advertises a car that gets 47 mpg. Let µ be the mean mileage for this model. You assume that the dealer will not underrate the mileage, but suspect he may overrate the mileage a. What can be used for H 0 ? b. What can be used for H 1 ? Guided Exercise 1A A company that manufactures ball bearings claims the average diameter is 6 mm. To check that the average diameter is correct, the company decides to formulate a statistical test. a. What can be used for H 0 ? b. What can be used for H 1 ? Guided Exercise 1B A consumer group wants to test the truth in a package delivery company’s claim that it takes an average of 24 hours to deliver a package. Complaints have led the consumer group to suspect the delivery time is longer than 24 hours. a. What can be used for H 0 ? b. What can be used for H 1 ? Page 3 of 34 Types of Tests: Left-tailed, Right-Tailed, Two-Tailed The null hypothesis generally states the parameter of interest equals a specific value; typically a historical value of a value of no change. For example, H 0 : µ = k . There are three types of statistical tests, which are determined by the alternate hypothesis as follows: Level of Significance The level of significance α is the probability we are willing to risk rejecting H 0 when it is true; it is typically between 1% or 5%. In the above pictures, think of α as the predetermined maximum area in the tail(s). Since H 0 : µ = k is a statement of “no change,” and is assumed true, we reject H 0 only if we take a random sample and the sample mean x is so far away from the assumed mean (H 0 : µ = k ) that it is statistically unlikely that the assumption µ = k can be true. That is, the area in the tail(s) must be less than or equal to the level of significance α , to reject H 0 . Left-Tail Test H 0 : µ = k H 1 : µ < k Right-Tail Test H 0 : µ = k H 1 : µ > k Two-Tail Test H 0 : µ = k H 1 : µ ≠ k x x x µ = k µ = k µ = k FTR H 0 FTR H 0 FTR H 0 x x Page 4 of 34 Example 2 Let x be random variable that represents the heart rate in beats per minute of Rosie, and old sheep dog. From past experience the vet knows that x is normally distributed with a mean of 115 bpm and standard deviation of σ = 12 bpm. Over the past several weeks Rosie’s heart rate (beats / min) was measured at 93 109 110 89 112 117 The sample mean is x = 105.0. The vet is concerned that Rosie’s heart rate may be slowing. At a 5% level of significance, does the data indicate this is the case? a. Establish the null hypothesis (i.e. nothing has changed) and the alternate hypothesis. b. Draw the x -distribution. Compute the probability of obtaining a sample mean of 105 bpm or less when the population mean is 115 bpm (by assumption). This area in the tail is called the P- value. c. What can you conclude about Rosie’s heartbeat? Page 5 of 34 P-value Assuming H 0 is true, the probability that the test statistic will take on values as extreme or more extreme than the observed test statistic is called the P-value of the test. The smaller the P-value computed from the sample data, the stronger the evidence against H 0 . In the x -distributions below, the P-value is the total area in the tail(s). Type I and Type II Errors A Type I error occurs when we reject a true null hypothesis H 0 . A Type II error occurs when we “fail to reject” a false null hypothesis H 0 . For a given sample size reducing the probability of a type I error increases the probability of a type II error, and visa versa. The probability of a type I error we are willing to accept in an application is called the level of significance, denoted α (alpha). Alpha is specified in advance. α = P(making a type I error) = P(rejecting a true H 0 ) e.g. If α = 0.05, then we say we are using a 5% level of significance. This means that in 100 similar situations H 0 will be rejected 5 times (on average) when it was true and should not have been. µ = k Left-Tail Test H 0 : µ = k H 1 : µ < k Right-Tail Test H 0 : µ = k H 1 : µ > k Two-Tail Test H 0 : µ = k H 1 : µ ≠ k µ = k µ = k x x x Area = P-value Area = P-value 2 x Page 6 of 34 Example 3 Reconsider Example 1 where H 0 : µ = 47 mpg H 1 : µ < 47 mpg a. Suppose α = 0.05. Describe a type I error and its probability. A type I error is rejecting a true null hypothesis; in this case rejecting the dealer’s claim that µ = 47 mpg and concluding that µ < 47 mpg when in fact the average number of miles per gallon is 47 or higher. P(type 1 error) = 0.05. b. Describe a type II error A type II error is failing to reject a false null hypothesis. In this case we “fail to reject” the manufacturer’s claim that µ = 47 mpg when in fact µ < 47 mpg. Guided Exercise 2 Recall the ball-bearing example where H 0 : µ = 6 mm and H 1 : µ ≠ 6 mm. Suppose α = 0.01. a. Describe a type I error and its consequences and probability. The probability of a type I error is 1%, the level of significance. A type I error would mean that we rejected the manufacturer’s claim the µ = 6 mm when in fact the average diameter was 6 mm. The consequence of a type I error in this application would be needless adjustment and delay in the manufacturing process. b. Describe a type II error and its consequences A type II error would mean that we “failed to reject” the manufacturer’s claim the µ = 6 mm when in fact µ ≠ 6mm. The consequence of a type II error in this application would be the production of many bearings that do not meet specifications. Page 7 of 34 Statistical Test Conclusions and Meanings For a given, preset level of significance α , and a P-value computed from the sample data: 1. If P-value ≤ α , then H o is rejected. That is, there is enough evidence in the [sample] data to reject H 0 . This means we chose the alternate hypothesis H 1 knowing we have not proven H 1 beyond all doubt. 2. If P-value > α , then we fail to reject H 0 . That is, there is not enough evidence in the [sample] data to reject H 0 . This means we retain H 0 knowing we have not proven it beyond all doubt. Example 4 A car manufacturer advertises a car that gets 47 mpg. Suppose that we sampled 40 cars and found a mean gas mileage of 46.26 mpg. The standard deviation is σ = 2.22 mpg. Test the manufacturers claim at a 5% level of significance (α = 0.05). a. Establish the null and alternate hypotheses. H 0 : µ = 47 mpg H 1 : µ < 47 mpg b. Draw the normal x -distribution and show the null hypothesis and sample statistic on the axis. Label the axis; include the units. c. Compute the p-value. p-value = normalcdf(0, 46.26, 47, 2.22 / 40 ) = 0.0175 Page 8 of 34 d. Conclude the test. Interpret its meaning in this application. The p-value is 0.018. Since the p − value = 0.018 ≤ α = 0.05, we reject H 0 , which means at a 5% level of significance the sample data is significant and supports that the mean car mileage is less than 47 mpg. e. Repeat part d, but test the manufacturers claim at a 1% level of significance (α = 0.01). The p-value is 0.018. Since the p − value = 0.018 > α = 0.01, we fail to reject H 0 , which means at a 5% level of significance the sample data is not strong enough to say the mean car mileage is less than 47 mpg. Page 9 of 34 9.1 Homework 1. Do problems 1-8 all. 2. On problems 9-14 follow these steps: Example (a) Write the null and alternate hypotheses. Include units. (b) Compute the standard error σ x = σ / n . Then sketch the normal curve and the area under the curve that represents the p- value. Label the axis to include the assumption in the null hypotheses and 3 standard deviation on both sides. Include units. (c) Compute the p-value (without using the ZTest function). (d) Conclude the test. That is, if the P-value ≤ α , then reject H 0 , otherwise do not reject H 0 . (e) Summarize the results. H 0 : µ = 47 mpg H 1 : µ < 47 mpg p − value = area in the tail(s) = normalcdf (0, 46.26, 47, 2.22 40 ) = 0.0175 p − value = 0.0175 < α = 0.05 Reject H 0 At a 5% l.o.s. the sample data is significant and supports that the mean car mileage is less than 47 mpg. x, mpg 47 4 6 . 6 5 4 6 . 3 0 4 5 . 9 5 sample mean x = 46.26 from H 0 Page 10 of 34 9.2 Testing the Mean µ Example 3 Testing the Mean µ when σ is Known Some scientists believe sunspot activity is related to drought duration. Let x by a random variable representing the number of sunspots observed in a four-week period. A random sample of 40 such periods in Spanish colonial times gave the following data: 12.5 14.1 37.6 48.3 67.3 70.0 43.8 56.5 59.7 24.0 12.0 27.4 53.5 73.9 104.0 54.6 4.4 177.3 70.1 54.0 28.0 13.0 6.5 134.7 114.0 72.7 81.2 24.1 20.4 13.3 9.4 25.7 47.8 50.0 45.3 61.0 39.0 12.0 7.2 11.3 The sample mean is x ≈ 47.0. Previous studies indicate that σ = 35. It is thought that for thousands of years, the mean number of sunspots per four-week period was about µ = 41. Do the data indicate, at a 5% level of significance, that the sunspot activity during the Spanish colonial period was higher than 41? a. Establish the hypotheses. b. What does a 5% level of significance mean in this application? We are willing to tolerate at most a 5% probability of rejecting a true null hypothesis. That is, assuming H 0 : µ = 41 is true, to reject H 0 means the probability that a sample x is as extreme or more extreme than our observed sample statistic ( x ≈ 47.0) must be less than α = 0.05. c. Explain the meaning of the P-value in this application. Assuming H 0 : µ = 41 is true, the P-value is the probability that a sample x is as extreme or more extreme than our observed sample statistic ( x ≈ 47.0). Page 11 of 34 d. Draw the x -distribution. Place the null hypothesis and the observed x on the axis. Then compute the P-value. e. Conclude the test. That is, if the P-value ≤ α , then reject H 0 , otherwise do not reject H 0 . f. Interpret your results. Page 12 of 34 9.2 Exercises #1-16: Steps to Test the Mean µ 1. Establish H 0 and H 1 : Left-Tailed Test Right-Tailed Test Two-Tailed Test H 0 : µ = k H 1 : µ < k H 0 : µ = k H 1 : µ > k H 0 : µ = k H 1 : µ ≠ k 2. Indicate which test you are using. The output for either test is the P-value. a. If σ is known, then the convention is to compute the P-value with a normal distribution. The Z-Test uses a normal distribution (STAT / TESTS / 1: Z-Test). b. If σ is NOT known, then the convention is to compute the P-value with the more conservative Student’s t-Distribution (STAT / TESTS / 2: T-Test). 3. Conclude the Test: If P-value ≤ α , then the sample data is significant and we reject H 0 , otherwise we do not reject H o . 4. State your conclusions in the context of the application. Page 13 of 34 Example 3 A zoo wishes to obtain eggs from a rare river turtle so they can be hatched and raised to preserve the species. Carol, a staff biologist, finds a nest of 36 eggs she suspects to be from the rare turtle species. Research has shown that the size of rare turtle eggs are normally distributed with a population mean of µ = 7.50 cm. Furthermore, the mean length of the eggs of the other (common) turtle species is known to be longer than 7.50 cm, For the sample, the mean length of the 36 eggs is x = 7.74 cm. The standard deviation of all turtle eggs is σ = 1.5 cm. So, Carol is concerned that the eggs may have come from a common turtle species. Do the data indicate that the eggs from the rare river turtle at a 5% level of significance. 1. Establish H 0 and H 1 . H 0 : µ = 7.50 cm H 1 : µ > 7.50 cm 2. State the possible conclusions and their interpretations in this application. 3. Explain a 5% level of significance in this application. Explain how serious a type I error is in this application? Test Conclusion Interpretation of the Result Fail to reject H 0 At a 5% level of significance the sample data is not strong enough to reject H 0 . That is, the sample evidence is not strong enough to say the eggs are from the common turtle. Reject H 0 At a 5% level of significance the sample data is statistically significant and is sufficient to reject H 0 , which suggests the eggs are from the common turtle. We will be wrong at most α =5% of the time. Page 14 of 34 A 5% level of significance means we are taking a 5% risk of a type 1 error – a 5% risk of rejecting a true H 0 . In this application we are only willing to take a 5% chance of rejecting that the eggs are from the rare river turtle. 5. Find the probability that our assumed mean in the null hypothesis (H 0 : µ = 7.50 cm) is at or further away than the test statistic ( x ). That is, find the P-value. 6. Conclude the test. 7. Interpret the results. Page 15 of 34 Example 5 The drug 6-mP (6-mercoptopurine) is used to treat leukemia. The following data represent the remission times (in weeks) for a random sample of 21 patients using 6-mP. 10 7 32 23 22 6 16 34 32 25 11 20 19 6 17 35 6 13 9 6 10 The sample mean is 17.1 weeks with a sample standard deviation of 10.0 weeks. Let x be a random variable representing the remission times (in weeks) for all patients. Assume the x- distribution is mound-shaped and symmetric. A previous drug treatment had a remission time of 12.5 weeks. At a 1% level of significance do the data indicate the mean remission time for 6-mP is different (either way)? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show your work and/or indicate the test used on your calculator to compute the P-value. 3. Interpret the results. Page 16 of 34 Example 6 Archeologists become excited when they find an anomaly in a newly discovered artifact. The anomaly may or may not indicate a new trading region or a new method of craftsmanship. Suppose the lengths of arrowheads at a certain site have a mean length of µ = 2.6 cm. A random sample of 61 recently discovered arrowheads in an adjacent cliff dwelling had a sample mean length of 2.92 cm. The standard deviation is σ = 0.85 cm. Do these data indicate that the mean length of arrowheads in the adjacent cliff dwelling is longer than 2.6 cm? Use a 1% level of significance. 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show your work and/or indicate the test used on your calculator to compute the P-value. 3. Interpret the results. Page 17 of 34 Example 7 By taking thousands of practice shots at driving ranges, Pam knows her mean distance using a #1 wood is 225 yards with a standard deviation σ = 25yards. Taking 100 shots with a new ball, Pam found her sample mean distance was 230 yards. At a 5% level of significance, determine if Pam improved her driving distance using the new ball? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show your work and/or indicate the test used on your calculator to compute the P-value. 3. Interpret the results. Page 18 of 34 Example 8 A large company with offices around the world occasionally must move their employees from one city to another. From long experience, the company knows its employees move on average once every 8.50 years with a standard deviation of 3.62 years. Recent trends have led some to believe a change might have occurred. A sample of 48 employees were asked the number of years since the company last moved them. The mean time was 7.91 years. Has the mean time between moves significantly changed? Use α = 0.05. 1. Establish the hypotheses. 2. Without using the ZTest, find the P-value of the test statistic and conclude the test. Show your work and/or indicate the test used on your calculator to compute the P-value. 3. Interpret the results. Page 19 of 34 Guided Exercise 5 Production records show that a machine that makes bottle caps makes caps with a mean diameter of 1.85 cm and a standard deviation of 0.05 cm. An inspector measured a random sample of 64 caps and found a mean diameter of 1.87 cm. At a 1% level of significance, determine if the machine slipped out of adjustment? 1. Establish the hypotheses. 2. Without using the ZTest, find the P-value of the test statistic and conclude the test. Show your work and/or indicate the test used on your calculator to compute the P-value. 3. Interpret the results. Page 20 of 34 Confidence Interval versus Two-tailed Hypothesis Test Suppose a two-tailed hypothesis test has a level of significance α and null hypothesis H 0 : µ = µ 0 . Let c be the confidence level for the mean µ based on the sample data. Then c = 1−α and 1. H 0 is not rejected whenever µ 0 falls inside the c confidence interval for the meanµ . 2. H 0 is rejected whenever µ 0 falls outside the c confidence interval for the mean µ. Exercise 19, Section 9.2 Consider a two-tailed hypothesis test with α = 0.01 and H 0 : µ = 20 H 1 : µ ≠ 20 A random sample of size 36 has a sample mean of 22. It is known the standard deviation σ =4. Use α = 0.03. a. Use hypothesis testing to see if there is sufficient evidence to reject H 0 . b. Solve using a confidence interval. i. What is the confidence level corresponding to a level of significance of 0.03? Find the ____% confidence interval for the mean x . We are ____% confident that the population mean µ is between ________ and ________. ii. Do we reject or fail to reject H 0 based on the 97% confidence interval. Page 21 of 34 9.3 Testing a Proportion p Setup and Assumptions 1. Let r be the binomial random variable representing the number of successes out of n trials. 2. The sample size n is large so that it can be approximated by a normal distribution. That is, np > 5 and nq > 5. 3. For the probability of success use ö p = r / n for the point estimate of the population parameter p. 4. The possible sets of hypotheses are: Left-Tailed Test Right-Tailed Test Two-Tailed Test H 0 : p = k H 1 : p < k H 0 : p = k H 1 : p > k H 0 : p = k H 1 : p ≠ k 5. TI-84: STAT / TESTS / 1-PropZTest Input: p 0 : from the H 0 x: the number of successes (the r-value) n: number of trials < p 0 , > p 0 , ≠ p 0 depending on H 1 Output: the P-value 6. Conclude the Test: If P-value ≤ α , then the sample data is significant and we reject H 0 , otherwise we conclude the sample data is not strong enough to reject H o . 7. Summarize your conclusion in the specific situation. Page 22 of 34 Example 9 A team of eye surgeons has developed a new technique for a risky eye operation to restore the sight of people blinded from a certain disease. Under the old method, only 30% of the patients recovered their eyesight. Surgeons have performed the new technique 225 times and 88 of those patients have recovered their sight. Can we justify the claim that the new technique is better than the old one at a 1% level of significance? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results. Page 23 of 34 Example 10 A botanist has produced a new variety of hybrid wheat that is better able to withstand drought than other varieties. He knows that 80% of the seeds from the parent plants germinate. He claims the hybrid has the same germination rate. To test this claim, 400 seeds from the hybrid plant are tested and 312 germinated. Test the botanist claim at a 5% level of significance. 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results. Page 24 of 34 9.4 Tests Involving Paired Differences (Dependent Samples) Dependent Samples Dependant samples have data that are naturally paired. Dependent samples occur naturally in many applications, such as “before and after” situations – where the same object is measured before and after a treatment. In such cases the difference in the two measures is tested. Examples of Dependent Samples a. A shoe manufacturer claims that among adults in the United States, the left foot is longer than the right foot. b. A weekend refresher math course is administered to new students. An exam is administered to each student before and after the course. Page 25 of 34 Testing the Difference, d, of Paired Data a. It is assumed the paired data are such that the difference d between the first and second members of each pair are approximately normally distributed with a population mean µ d . b. A random sample of n data pairs with sample mean d and sample standard deviation s d follow a Student’s t distribution and can be tested with STAT / TESTS / 2: T-Test. c. The possible sets of hypotheses to be tested are: Left-Tailed Test Right-Tailed Test Two-Tailed Test H 0 : µ d = 0 H 1 : µ d < 0 H 0 : µ d = 0 H 1 : µ d > 0 H 0 : µ d = 0 H 1 : µ d ≠ 0 4. TI-83: STAT / TESTS / 2: T-Test Input: µ 0 : from the H 0 x : the mean of the differences d s x : standard deviation of d , s d n: number of pairs in the sample µ: < µ 0 , > µ 0 , ≠ µ 0 depending on H 1 Output: the P-value 5. Conclude the Test: If P-value ≤ α , then the sample data is significant and we reject H 0 , otherwise we conclude the sample data is not strong enough to reject H o . 6. Interpret the results (specific to application). Page 26 of 34 Example 10 Heart surgeons know that many patients who undergo heart surgery have a dangerous buildup of anxiety before the operation. Psychiatric counseling may relieve some of that anxiety. The data shown are the anxiety scores of patients before and after counseling. Higher scores mean higher levels of anxiety. Can we conclude that counseling reduces anxiety? Use α = 0.01. 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Patient B Score before counseling A Score after counseling d = A – B Difference A B C D E F G H I 121 93 105 115 130 98 142 118 125 76 93 64 117 82 80 79 67 89 -45 0 -41 2 -48 -18 -63 -51 -36 Page 27 of 34 Example 11 To test the quality of two brands of tires, one tire of each brand was placed on six test cars. After 6 months the amount of wear on each tire was measured in thousandths of inches. Can we conclude the two tire brands show unequal wear at a 2% level of significance? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Car Soapstone Bigyear Difference d = S - B 1 2 3 4 5 6 132 71 90 37 93 107 140 74 110 36 105 119 -8 -3 -20 1 -12 -12 Page 28 of 34 9.5 Testing µ 1 − µ 2 and p 1 − p 2 (Independent Samples) Samples are independent if there is no relationship whatsoever between specific values of the two distributions. Example 12 A teacher wishes to compare the effectiveness of two teaching methods. Students are randomly divided into two groups: The first group is taught by method 1 and the second group by method 2. At the end of the course, a comprehensive exam is given to all students. The mean scores, x 1 and x 2 , of the two groups are compared. Are the samples independent or dependent? Example 13 A shoe manufacturer claims that for U.S. adults the average length of the left foot is longer than the average length of the right foot. A random sample of 60 adults is drawn and the length of both their left and right feet are measured and averaged as x 1 and x 2 , respectively. Are the samples independent or dependent? Theorem 9.2 Let x 1 have a normal distribution with mean µ 1 and standard deviation σ 1 . Let x 2 have a normal distribution with mean µ 2 and standard deviation σ 2 . Suppose random sample of size n 1 and n 2 are taken from the respective distributions. Then the variable x 1 − x 2 has 1. A normal distribution. 2. Mean µ 1 − µ 2 3. Standard deviation σ 1 2 / n 1 −σ 2 2 / n 2 Page 29 of 34 Steps for Section 9.5 Problems 1. Establish H 0 and H 1 . Left-Tailed Test Right-Tailed Test Two-Tailed Test H 0 : µ 1 = µ 2 H 1 : µ 1 < µ 2 H 0 : µ 1 = µ 2 H 1 : µ 1 > µ 2 H 0 : µ 1 = µ 2 H 1 : µ 1 ≠ µ 2 2. Indicate which test you are using. a. If σ 1 and σ 2 are known, then the convention is to compute the P-value with a normal distribution. The 2- SampZTest uses a normal distribution (STAT / TESTS / 3: 2-SampZTest). b. If σ 1 and σ 2 are not known, then the convention is to compute the P-value with the more conservative Student’s t-Distribution (STAT / TESTS / 4: 2-SampTTest). Input the sample standard deviation s. 3. Conclude the Test: If P-value ≤ α , then the sample data is significant and we reject H 0 , otherwise we conclude the sample data is not strong enough to reject H o . 4. Interpret the results [specific to the context of the application]. Page 30 of 34 Example 14 A consumer group measures the heating capacity of camp stoves by measuring the time it takes the stove to boil 2 quarts of water from 50 0 F. Two competing models were tested: Model 1: x 1 = 11.4 min σ 1 = 2.5 min n 1 = 10 Model 2: x 2 = 9.9 min σ 2 = 2.5 min n 2 = 12 Is there a difference in the performance of the two models at a 5% level of significance? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Page 31 of 34 Example 15 Two competing headache remedies claim to give fast-acting relief. An experiment was performed to compare the mean lengths of time required for bodily adsorption of brand A and brand B: Brand A: x 1 = 21.8 min s 1 = 8.7 min n 1 = 12 Brand B: x 2 = 18.9 min s 2 = 7.5 min n 2 = 12 Assuming both distributions are approximately normal, test the claim that there is no difference in the mean time required for bodily absorption. 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Page 32 of 34 Testing Two Proportions p 1 & p 2 STAT / TESTS / 6: 2-PropZTest Example 16 The Macek County Clerk wishes to improve voter registration. One method under consideration is to send reminders in the mail to all citizens in the county who are eligible to register. A random sample of 1250 potential register voters was taken. Group 1: There were 625 people in this group. No reminders to register were sent to them. The number of potential voters from this group who registered was 295. Group 2: There were 625 people in this group. Reminders to register were sent to them. The number of potential voters from this group who registered was 350. At a 5% level of significance, did reminders improve voter registration? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Page 33 of 34 Guided Exercise 11 The Macek County Clerk wishes to improve voter registration. One method under consideration is to send reminders in the mail to all citizens in the county who are eligible to register. A random sample of 1100 potential register voters was taken. Group 1: There were 500 people in this group. No reminders to register were sent to them. The number of potential voters from this group who registered was 248. Group 2: There were 600 people in this group. Reminders to register were sent to them. The number of potential voters from this group who registered was 332. At a 1% level of significance, did reminders improve voter registration? 1. Establish the hypotheses. 2. Find the P-value of the test statistic and conclude the test. Show the test used on your calculator to compute the P-value. 3. Interpret the results [specific to the context of the application]. Page 34 of 34 TI-83/84 STAT / TESTS menu Section Description 1: Z-Test 9.2 Testing the mean µ when σ is known. Be able to do these problems without using the Z-Test function. That is, sketch the distribution and compute the p-value using the normalcdf function. 2: T-Test 9.2, 9.4 Testing the mean µ when σ is not known, or testing dependent paired data µ d = 0. 3: 2-SampZTest 9.5 Testing two mean µ 1 − µ 2 when σ 1 and σ 2 are known. 4: 2-SampTTest 9.5 Testing two mean µ 1 − µ 2 when σ 1 and σ 2 are not known. 5: 1-PropZTest 9.3 Testing a proportion p. 6: 2-PropZTest 9.5 Testing two proportions. 7: ZInterval 8.1 Estimating µ when σ is known. Be able to do these problems without using the ZInterval function. That is, sketch the distribution and compute the interval using the invNorm function. 8: TInterval 8.2 Estimating µ when σ is not known. 9: 2-SampZInt 8.5 Estimating µ 1 − µ 2 when σ 1 and σ 2 are known. 0: 2-SampTInt 8.5 Estimating µ 1 − µ 2 when σ 1 and σ 2 are known. A: 1-PropZInt 8.3 Estimating p when the Binomial Distribution. B: 2-PropZInt 8.5 Estimating p 1 − p 2 10.1 Scatter Diagrams (Page 1 of 13) 10.1 Paired Data and Scatter Diagrams Linear Equations Linear equations (or linear functions) graph as straight lines and can be written in the form: y = bx + a where (0, a) is the y-intercept, and b = slope of the line = rise run = y 2 − y 1 x 2 − x 1 Example A a. Identify the slope and y- intercept in each of the equations. b. Graph each equation using the y-intercept and the slope. c. Graph each equation using the TI-83. Equation y = bx + a Slope b y-intercept (0, a) y = 2x −5 y = − 2 3 x + 3 y = 5 y = −x 4 4 - 4 - 4 y x 4 4 - 4 - 4 y x 10.1 Scatter Diagrams (Page 2 of 13) Scatter Diagram; Explanatory & Response Variables A scatter diagram is a plot of ordered-pair (x, y) data. We call x the explanatory variable and y the response variable. Example 1 Phosphorous, a chemical in many household and industrial cleaning compounds, often finds its way into surface water. A random sample of eight sites in California wetlands gave the following information about phosphorous reduction in drainage water. In this study, x represents phosphorous concentration (in 100 mg/l) at the inlet of a bio-treatment facility and y represents the phosphorous concentration at the outlet of the facility. a. Make a scatter diagram of the data. Label and scale the axes. Then draw a “best fit” linear model through the data. b. Do x and y appear to be linearly related? c. Use the linear model (line) to predict the outlet concentration of phosphorous if the inlet concentration is 700 mg/l. d. Use the linear model to predict the inlet concentration of phosphorous if the outlet concentration is 200 mg/l. x 5.2 7.3 6.7 5.9 6.1 8.3 5.5 7.0 y 3.3 5.9 4.8 4.5 4.0 7.1 3.6 6.1 10.1 Scatter Diagrams (Page 3 of 13) Linear Correlation If the scatter plot of ordered pair data, represented by variables x and y, trends roughly into a straight line, then we say that x and y are linearly correlated. Linear correlation is classified in two general ways: 1. Degree: none, low-moderate, high, perfect Perfect linear correlation means that all (x, y) ordered pairs of data lie on the same straight line. 2. Sign or slope: positive or negative i. Positive linear correlation means that high values of x correlate with high values of y, and low values of x correlate with low values of y. The graph has a positive slope (and trends upward from left to right). ii. Negative linear correlation means that high values of x correlate with low values of y, and low values of x correlate with high values of y. The graph has a negative slope (and trends downward from left to right). See figures 10-1, 10-2, 10-4 Guided Exercise 2, Table 2, and Exercises 1 & 2 10.1 Scatter Diagrams (Page 4 of 13) Guided Exercise 1 An industrial plant has 7 divisions that do the same type of work. A safety inspector tracks x = “the number of work-hours devoted to safety training” and y = “the number of work- hours lost due to accidents.” The results are shown. (a) Make a scatter diagram for the data. (b) Make a scatter diagram on your calculator. Enter the x-values into L 1 and the y-values into L 2 . Turn the STAT PLOT on and adjust the window settings to (c) Draw a “best fit” line through the data and classify the linear correlation as (i) none, low- moderate, high, or perfect, and (ii) positive or negative. (d) Use your linear model (i.e. read from the line) to predict the number of safety training hours needed so that 20 work-hours are lost due to accidents. (e) Use your linear model (i.e. read from the line) to predict the number of work-hours are lost due to accidents when 30 hours are spent on safety training. Division x y 1 10.0 80 2 19.5 65 3 30.0 68 4 45.0 55 5 50.0 35 6 65.0 10 7 80.0 12 10.1 Scatter Diagrams (Page 5 of 13) Sample Correlation Coefficient r The correlation coefficient r is a unit-less numerical measure that assesses the strength of linear relationship between two variables x and y. See table 10-2. 1. −1≤ r ≤1 2. If r = 1, there is perfect positive linear correlation. 3. If r = −1, there is perfect negative linear correlation. 4. The closer r is to 1 or –1, the better a line describes the relationship between x and y. 5. If r is positive, then as x increases, y increases. 6. If r is negative, then as x increases, y decreases. 7. The value of r is the same regardless of which variable is the explanatory and which is the response variable. Data plotted as (x, y) and (y, x) will have the same value for r. Computation of r 1. Turn DiagnosticOn in the CATALOG menu. 2. Enter the x-values into L 1 and the y-values into L 2 . 3. STAT / CALC/4: LinReg(ax+b) L x ,L y ,Y 1 4. Highlight Calculate and press ENTER. Then scroll down to find the value of r. Guided Exercise 1 (f) Find the sample correlation coefficient for the data in guided exercise 1. Sample versus Population Correlation Coefficient r r = sample correlation coefficient computed from a random sample of (x, y) data pairs. ρ = population correlation coefficient computed from all population data pairs (x, y). 10.1 Scatter Diagrams (Page 6 of 13) Lurking Variables In ordered pairs (x, y), x is called the explanatory variable and y is called the response variable. When r indicates a linear correlation between x and y, a change in the values of y tends to respond to changes in values of x according to a linear model. A lurking variable is a variable that is neither an explanatory nor a response variable. Yet, a lurking variable may be responsible for changes in both x and y. Correlation does not necessarily mean causation. Example 3 It has been observed in a certain community that over the years the correlation between x, the number of people going to church, and y, the number of people in jail, was r = 0.90. Does going to church cause people to go to jail, or visa versa? Explain. 10.2 Linear Regression and the Coefficient of Determination (Page 7 of 13) 10.2 Linear Regression and the Coefficient of Determination Least-Squares Linear Regression Line The least-squares linear regression line is the line that fits the (x, y) data points in such a manner that the sum of the squares of all the vertical distances from the data points to the line is a small as possible. The point (x, y) is always on the least-squares regression line. Computing the Linear Regression Line on the TI-83/84 STAT / CALC / 4: LinReg(ax+b) L x , L y , Y 1 Output: a and b in y = bx + a 10.2 Linear Regression and the Coefficient of Determination (Page 8 of 13) Example 4 In Denali national Park, Alaska, the wolf population is dependent on the caribou population. Let x represent the caribou population (in hundreds) and y represent the wolf population. A random sample in recent gave the following information. (a) Identify the explanatory and response variables. (b) Make a scatter diagram of the data. (c) Find the linear regression line for the data. Graph the LSRL – write at least 2 points on the line. (d) Interpret the slope of the line in this application. (e) Predict the size of the wolf population when the caribou population is 2100. Is this interpolation or extrapolation? (f) Predict the size of the wolf population when the caribou population is 4000. Is this interpolation or extrapolation? x 30 34 27 25 17 23 20 y 66 79 70 60 48 55 60 10.2 Linear Regression and the Coefficient of Determination (Page 9 of 13) Coefficient of Determination r 2 If r is the correlation coefficient, then r 2 is called the coefficient of determination and r 2 = Explained Variation Total Variation . 1. The value of r 2 is the ratio of explained variation over total variation. That is, r 2 is the fractional amount of the total variation in y that can be explained by using the linear model y = bx + a with x as the explanatory variable. 2. 1− r 2 is the fractional amount of the total variation in y that is due to random chance or to the possibility of lurking variables that influence y. Example 4A (a) Find r and r 2 for example 4. (b) Explain the value of r 2 in this application. (c) Explain the value of 1− r 2 in this application. Change on Directions for 10.2 Exercises Do the following in problems 7-18: (a) View the scatter diagram of the data on your calculator to verify a linear model is appropriate. (b) Find a and b for the least-squares regression line y = bx + a. Then find r, r 2 , x and y . (c) Graph the regression line from part (b). Be sure the point (x, y) is on the graph. (d) Interpret the values of r 2 in one sentence relevant to the application. Interpret the values of 1− r 2 in one sentence relevant to the application. 10.2 Linear Regression and the Coefficient of Determination (Page 10 of 13) Guided Exercise 3 Quick Sell car dealership runs 1-minute TV advertisements and tracks x = “the number of ads that week.” and y = “the number of cars sold that week.” x 6 20 0 14 25 16 28 18 10 8 y 15 31 10 16 28 20 40 25 12 15 Complete steps (a) – (d). Then find the predicted number of cars sold per week if the budget only allows 12 ads to be run per week. (a) View the scatter diagram of the data on your calculator to verify a linear model is appropriate. (b) Find a and b for the least- squares regression line y = bx + a. Then find r, r 2 , x and y . (c) Graph the regression line from part (b). Be sure the point (x, y) is on the graph. (d) Interpret the values of r 2 in one sentence relevant to the application. Interpret the values of 1− r 2 in one sentence relevant to the application. 10.3 Testing the Correlation Coefficient (Page 11 of 13) 10.3 Testing the Correlation Coefficient The population correlation coefficient ρ (rho, read “row”) is estimated by the statistic r. If we assume the variables x and y are normally distributed and want to test if they are correlated in the population, then we set the null hypothesis to say they are not correlated: H 0 : ρ = 0 x and y are not correlated at the given level of significance. Theorem Let random variables x and y be normally distributed. If ρ = 0 (as assumed in the null hypothesis), then the distribution of sample correlation coefficients (the r values) is normally distributed about r = 0. r-axis Distribution of r – values when ρ = 0 10.3 Testing the Correlation Coefficient (Page 12 of 13) Example 6 Do college graduates have an improved chance of a better income? Let x = percentage of the population 25 or older with at least 4 years of college and y = percentage growth in per capita income over the past seven years. A random sample of six communities in Ohio gave the information in the table. (a) Find the correlation coefficient r. (b) Test to see if the correlation coefficient is positive at a 1% level of significance. (c) Summarize your conclusions in one sentence relevant to this application. 10.3 Homework Do Exercises 7-12 parts (b) and (d) only. x 9.9 11.4 8.1 14.7 8.5 12.6 y 37.1 43 33.4 47.1 26.5 40.2 10.3 Testing the Correlation Coefficient (Page 13 of 13) Exercise 10.3 #3 What is the optimal time for a scuba diver to be at the bottom of the ocean? The navy defines optimal time to be the time at each depth for the best balance between length of work period and decompression time after surfacing. Let x = depth of a dive in meters, and y = optimal time in hours. A random sample of divers gave the following data. (b) Use a 1% level of significance to test the claim that ρ < 0. (d) Find the predicted optimal time for a dive depth of 18 meters. x 14.1 24.3 30.2 38.3 51.3 20.5 22.7 y 2.58 2.08 1.58 1.03 0.75 2.38 2.20 11.1 Chi Square, χ 2 : Tests for Independence Example 1 A keyboard manufacturer wants to know: “Is the time a new student takes to learn to type independent or the arrangement of the letters on the keyboard?” Three hundred beginning typing students were randomly assigned to learn to type 20 wpm on three different keyboards. The observed data is in the table below. Use a 1% level of significance. 1. The company would test the following hypotheses: H 0 : Keyboard letter arrangement and learning times are independent. H 1 : Keyboard letter arrangement and learning times are not independent. 2. Enter the contingency matrix into a matrix. i. MATRIX / EDIT / [A] ii. Enter the number of rows (3) and the number of columns (3). iii. Enter each element in the contingency matrix row-by-row. Keyboard 21-40 h 41-60 h 61-80 h Total A 25 30 25 80 B 30 71 19 120 Standard 35 49 16 100 Total 90 150 60 300 3. Compute the p-value using STAT / TESTS / C: χ 2 -Test 4. State your conclusion in a sentence relevant to the problem. A = 25 30 25 30 71 19 35 49 16           11.1 Exercise 8 After a large fund drive for the City Library, the following information was obtained from a random sample of contributors to the library. Use a 1% level of significance to test the claim that “the amount contributed to the library is independent of ethnic group.” Ethnic Group 1-50 ($) 51-100 ($) 101-150 ($) 151-200 ($) $201+ Total A 310 715 201 105 42 1373 B 619 511 312 97 22 1561 C 402 624 217 88 35 1336 D 544 571 309 79 29 1532 Total 1875 2421 1039 369 128 5832 11.2 Chi Square, χ 2 : Goodness of Fit Example 1 Last year management listed five items and asked each employee to mark the one item most important to him/her. The percentage results are in the third column of the table. This year the managers asked 500 employees the same thing and observed the results in column 2. Test to see if this years distribution “fits” last years at a 1% level of significance. Item Observed, O Expected % Expected, E (O− E) 2 E Vacation 30 4% Salary 290 65% Safety 70 13% Retirement 70 12% Overtime 40 6% Total 500 100% 500 1. Set-up the χ 2 “goodness-of-fit” hypotheses: H 0 : The population fits the given distribution (i.e. last year’s) H 1 : The population this year ha a different distribution than last years. 2. Compute the χ 2 value. χ 2 = (O− E) 2 E ∑ 3. Sketch the χ 2 -distribution and find the critical value χ α 2 (the minimum χ 2 required to reject H 0 ) from table 8. The degrees of freedom df = (number of data rows) – 1. 4. State your conclusions. 11.2 Exercise 2 The type of household for the entire United States and a random sample of 411 households in Dove Creek is given in the table. . Test to see if Dove Creek’s distribution of households is the same as the U.S. distribution at a 5% level of significance. Item U.S. % Number in Dove Creek Expected, E (O− E) 2 E Married w/children 26% 102 Married w/o children 29% 112 Single Parent 9% 33 One Person 25% 96 Other 11% 68 1. Set-up the χ 2 “goodness-of-fit” hypotheses: H 0 : The distribution of household type in Dove Creek fits the distribution in the U.S. H 1 : The distribution of household type in Dove Creek is different than the distribution in the U.S. 2. Compute the χ 2 value. χ 2 = (O− E) 2 E ∑ 3. Sketch the χ 2 -distribution and find the critical value χ α 2 (the minimum χ 2 required to reject H 0 ) from table 8. The degrees of freedom df = (number of data rows) – 1. 4. State your conclusions.
Copyright © 2024 DOKUMEN.SITE Inc.