Chapter1 Notes - Applied Statistics

March 17, 2018 | Author: Abdul Rajak AR | Category: Normal Distribution, Standard Deviation, Confidence Interval, Sample Size Determination, Variance


Comments



Description

CHAPTER 1PARAMETER ESTIMATION 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 12-3 INTRODUCTION Parameter estimation is the first step in inferential statistics. In other words, it is the process of estimating the value of a parameter using information obtained from a sample. The process that acquires information from samples and used the information to make conclusions about populations is called statistical inference. In order to do statistical inference, we require the skills and knowledge of descriptive statistics, probability distributions, and sampling distributions. The process can be simply as in figure 1. INTRODUCTION (cont..) The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. There are two approaches to parameter estimation which are i) Point estimation Using point estimate will obtain value that is either 100% accurate or 100% different from the true value Note that, true value = parameter value ii) Interval estimation 3 INTRODUCTION (cont..) Estimator is the statistic used to obtain the point estimate. Estimate is a specific value or range of values used to approximate some population parameter. Why we estimate? Can we get the exact value from the population? 4 .INTRODUCTION (cont.1: Relationship between parameter and statistic 5 .) Population Eg: All UUM students Sampling process Samples are taken at random Part of population unit eg: a number of UUM students census Collect Information/ data survey Collect Information/data Population measurement Sample measurement estimate PARAMETER STATISTICS Figure 1. A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point. 6 .POINT ESTIMATION A point estimate is a specific numerical value of a parameter or a single value (or point) used to approximate a population parameter. 7 .1: Symbols for parameter and statistics Parameter Statistics/ estimator mean. mean. standard deviation. variance. standard deviation.POINT ESTIMATION (CONT…) Table 1. proportion. µ variance. proportion. POINT ESTIMATION (CONT…) Table 1.2: Formulas for statistics Statistics/ estimator Formula Sample mean Sample variance Sample standard deviation Sample proportion 8 . POINT ESTIMATION (CONT…) Characteristics of Good Estimator The objective of each characteristic good estimator is to obtain an estimator with the sampling distribution mean centered to the parameter being estimated. The characteristic include: un-biasness Consistency relatively efficiency 9 . since: 10 .POINT ESTIMATION (CONT…) An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. An estimator is an unbiased estimator for parameter if E( )=θ E.g. the sample mean is an unbiased estimator of the population mean µ . the variance of smaller.POINT ESTIMATION (CONT…) An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. x grows 11 . E. is a consistent estimator of µ because: x That is. as n grows larger.g. E. x . both sample median and mean are unbiased estimators of the population mean. x since it is relatively efficient 12 compared to the sample median ~ . according to the variances so we choose mean.x POINT ESTIMATION (CONT…) If there are two unbiased estimators of a parameter.g. However. the one whose variance is smaller is said to be relatively efficient. POINT ESTIMATION (CONT…) Example 1: A sample of 10 frogs has been taken at random and the weight (in grams) for each of the frog was recorded and given as below: 250 230 190 200 210 195 225 200 230 240 1. compute the point estimate for the mean weight of the frogs 2. estimate the proportion of frogs that have weight not more than 200 grams 13 . estimate the standard deviation for the weight of the frogs 3. the estimate of the standard deviation for the weight of the frogs is given by 14 . let represent the mean weight of the frogs ii.Solution: i. Solution: iii. Note: the answer for question i) and ii) can be found directly from your calculator using the mode (SD) function. 190 Then. Those who use calculator model Casio can refer to Appendix 1 for the complete procedure. 195. 200. Thus. 15 . let be the number of frogs with weight not more than 200 grams and be the point estimate for the proportion of frog with weight not more than 200 grams Frogs with weight not more than 200 grams are. 200. proportion of students with age more than 15 years old.Example 2: The age of 15 students who came to the recreational club during last weekend are as given below: 8 17 15 15 13 15 10 12 16 16 12 16 17 15 18 Calculate the point estimate of the: i. 16 . average age of students ii. variance of age of students iii. i. 17 . 88 of them are living in Jitra. From a sample of 200 randomly chosen people. Obtain the point estimate for the percentage of UUM’s staff living in Jitra. Estimate the mean and the standard deviation of the proportion. ii.Example 3: A research has been done to determine percentage of UUM’s staff living in Jitra. it is recommended to use interval estimator to estimate population parameters. which is less precise but safer. we have to admit that the point estimate can sometimes gives a value which is 100% different from the true value. An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. the point estimators don’t reflect the effects of larger sample sizes.Interval Estimation No matter how good is the point estimator is. 18 . Besides that. Thus. we write the value as Lower bound < population parameter < Upper bound If is the point estimate of parameter given by . then the interval estimate is Where.Interval Estimation (cont…) The value of interval estimator is between lower and upper boundaries. S θˆ is the standard deviation of the estimator k is the distribution of the parameter (distribution can be define based on Central Limit Theorem) () 19 . Generally. we can conclude (with some ___% of certainty) that the population parameter of interest is between some lower and upper bounds. In this section we will discuss the interval estimate for the mean and the proportion and is summarize in Figure 1. 20 .Interval Estimation (cont…) Once the interval estimate is obtained.2. 2: Interval estimation 21 .Interval Estimation (cont…) Figure 1. 22 .Interval Estimation (cont…) Interval estimation for mean Generally. the interval estimator for one population mean is given by x − Zα S x < µ < x + Zα S x 2 2 () 2 () or x ± Z α S x () Note: the Z distribution can be replace by t distribution if the condition to use Z distribution is not satisfied. we have to follow the Central Limit Theorem Figure 1.3: Central Limit Theorem 23 .Interval Estimation (cont…) To determine whether to use Z or t distribution. the normal Z distribution can be used.4: Condition to use normal Z distribution 24 . Figure 1.Interval Estimation (cont…) Characteristics of the Z Distribution When the standard deviation of population is known or the sample size taken is more than or equal to 30. the t distribution with degrees of freedom must be used instead of Z distribution. The degrees of freedom are the number of values that are free to vary after a sample statistic has been computed. 25 .Interval Estimation (cont…) Characteristics of the t Distribution 1. 2. When the population standard deviation is unknown and the sample size is less than 30. As the sample size increases. 26 . i. ii.Interval Estimation (cont…) The t distribution differs from the standard normal distribution in the following ways. the t distribution approaches the standard normal distribution. which is related to sample size. The t distribution is actually a family of curves based on the concept of degrees of freedom. The variance is greater than 1. Interval Estimation (cont…) Figure 1.5: The Z Normal and t distribution 27 . Interval Estimation (cont…) Figure 1.6: t distribution with different degrees of freedom. 28 Interval Estimation (cont…) When to use the z or t distribution? Is population std. dev. σ known? Use Z distribution no matter what the sample size is. Yes No Is sample size, n > 30? Yes No * Variable are normally distributed when n<30 Use Z distribution and s in place of σ. Use t distribution and s in the formula. ** variable are approximately normally distributed Figure 1.7: Criteria for choosing Z or t distribution 29 Interval Estimation (cont…) Therefore; the confidence interval for a mean has 3 formulas; 1. confidence interval for a mean with known population standard deviation x − Zα σ 2 n < µ < x + Zα σ 2 n or 30 Interval Estimation (cont…) 2. confidence interval for a mean with unknown population standard deviation, sample size more than or equal to 30 . x − Zα S 2 S < µ < x + Zα n 2 or n 31 n −1 n 32 . sample size less than 30 (n<30).Interval Estimation (cont…) 3. n − 1 S n 2 or 2 . confidence interval for a mean with unknown population standard deviation. x − tα S < µ < x + tα . 8: Graphical view of confidence interval 33 .Interval Estimation (cont…) The graphical view of interval estimate: Width of interval LCL: UCL: Figure 1. Confidence Level is the relative frequency of times the confidence interval actually does contain the population parameter. assuming that the estimation process is repeated a large number of times. There are four commonly used confidence levels… 34 .Interval Estimation (cont…) The probability of ( 1 − α ) is called Confidence Level (or degree of confidence). The is called significance level or the probability of Type I error will occur. 01 0.10 0.05 0.95 0.3323 0.98 0.Interval Estimation (cont…) Confidence Level 10.005 2.5758 35 .9600 0.01 2.025 1.99 0.02 0.90 0.6449 0.05 1. 2498 36 .01 0.5706 2.98 0.95 0.10 0.90 0.3534 2. Confidence Level (10.025 0.05 0.02 0.05 0.There are the critical values for t distribution.01 0.9980 3.005 2.99 df 3 5 7 9 0. Example 4: A computer company samples demand during lead time over 25 time periods: 235 421 394 261 386 374 361 439 374 316 309 514 348 302 296 499 462 344 466 332 253 369 330 535 334 It is known that the standard deviation of demand over lead time is 75 computers. Estimate the mean demand over lead time with 95% confidence level in order to set inventory levels. 37 . 6 years. and the mean is found to be 23. find the 99% confidence interval of the population mean. 38 . the standard deviation is known to be 2 years.8 year. From past studies. A sample of 50 students is selected.Example 5: The president of a large university wishes to estimate the average age of the students presently enrolled.2 years. Example 6: A survey of 30 adults found that the mean age of a person’s primary vehicle is 5. Assuming the standard deviation of the population is 0. Find the 95% confidence interval of the population mean. 08 inch.04 ounces.Example 7: A cereal company selects twenty five 12-ounce boxes of corn flakes every 10 minutes and weighs the boxes. The mean was 0. 39 . and tread depth of the right front tire was measured. One such sample yields calculate the 90% confidence interval of the population mean.32 inch and the standard deviation was 0. Find the 95% confidence interval of the mean depth. Suppose the weights have a normal distribution with variance is 0. Example 8: Ten randomly selected automobiles were stopped. Assume that the variable is approximately normally distributed. Example 9: The average production of peanuts in the state of Virginia is 3000 pounds per acre. The mean yield with the new plant food is 3120 pounds of peanuts per acre with a standard deviation of 578 pounds. Interpret the interval. Find the 95% confidence interval for the mean amount of rainfall during the summer months for the northeast part of the United States. A new plant food has been developed and is tested on 60 individual plots of land. 40 . 32 49 21 32 25 34 25 36 31 38 27 40 22 30 44 28 39 36 18 38 Find a confidence interval for the mean daily high temperature. should we use t or Z distribution? Explain.Example 10: The following daily highs were recorded in the city of Chicago on 20 randomly selected December days. 41 . examination result (pass and failed). proportion or number of success for a specific event. The procedures for drawing inferences about proportion are involved the nominal and sometimes ordinal scale (i. present). 42 . job satisfaction (satisfied and unsatisfied).e categorical data).Interval estimation for proportion Whenever the information is given in percentage. then the problem being investigated has something to do with proportion. opinion (poor and good). attendance (absent. Example of categorical data: gender (male and female). etc. Interval estimation for proportion The point estimate for the proportion is given by ˆ p= x Where. n = symbol for the sample proportion = number of sample units that possess the characteristics of interest = sample size. 43 . Interval estimation for proportion Knowing that: Sample size n is big Both and are greater than or equal to 5 then. p + q = 1 . the formula to estimate the confidence interval for a proportion is given by ˆ ˆ ˆ ˆ p − Zα S ( p ) < p < p + Z α S ( p ) 2 2 ˆ = p − Zα ˆ = p − Zα ˆ ˆ p(1 − p ) 2 n ˆˆ pq ˆ < p < p + Zα ˆ ˆ p(1 − p ) 2 n 2 ˆˆ pq ˆ < p < p + Zα n n 2 44 ˆ ˆ Where is. Example 12: A survey found that out of 200 workers. 45 . What is the proportion of individual living in Miami who are obese? Obtain the 95% confidence interval of the proportion of individual living in Miami who are obese and interpret. faxes and etc. the percentage of the workers who are not interrupted three or more times an hour. Estimate with 90% confidence level. message.Example 11: A recent study of 100 people in Miami found 27 were obese. 168 said they were interrupted three or more times an hour by phone. and find a 95% confidence interval for the proportion of pine trees have been infested 46 . calculate the point estimator of the proportion of pine trees has been infested. The result showed that 153 of the trees showed such traces.Example 13: In a random sample of 500 observations. Example 14: A random sample of 1500 pine trees was tested for traces of the Bark Beetle infestation. we found the proportion of successes to be 48%. Assuming the data is approximately normally distributed. Estimate with 95% confidence the population proportion of successes. 47 . Estimate with 95% confidence the population proportion of all attempted theft of second base that is successful. He observed several hundred games and counted the number of time runner on first base attempted to steal second base. Construct a 98% confidence interval for the proportion of telephone’s defect. The test for a random sample of 150 telephones revealed that 9 of them are defective. He found there were 373 such events of which 259 were successful.Example 15: The quality control manager at Ameen Company claims that the production of model A telephone ‘to be out of control’ when the overall rate of defects exceed 4%. Example 16: A statistics practitioner working for a major league baseball wants to supply radio and television commentators with interesting statistics. maximum error.Sample size Sample size for Mean Recall back: the interval formula for estimating population mean is x − Zα σ 2 < µ < x + Zα σ n 2 4n 1 2 3 4 Error (E ) note that. E = Zα σ 2 n 48 . we then can calculate the value of sample size. n which is given by  σ  Zα   2 n= E2 2 2 49 .Sample size (cont…) Sample size for Mean using the maximum Error formula. Sample size (cont…) Sample size for proportion Recall back: interval formula for estimating population proportion is ˆ p − Zα ˆ ˆ p(1 − p ) 2 ˆ < p < p + Zα n 2 4 44 n 14 2 3 Error ( E ) ˆ ˆ p(1 − p ) 50 . E = Zα ˆ ˆ p(1 − p ) 2 n using the maximum Error formula. we then can calculate the value of sample size. maximum error.Sample size (cont…) Sample size for proportion note that. n which is given by ˆ (1 − p ) Zα  ˆ  ˆ q  Zα  p p ˆ    2 =  2 n= 2 2 E E 2 2 51 . the confidence level the sample size. 52 .Conclusion In conclusion. the width of the confidence interval estimate is affected by the population standard deviation. 9: relationship between width and confidence level 53 . the population standard deviation. and the sample size… x − Zα S 2 n S 2 < µ < x + Zα n S 2 n = x ± Zα A larger confidence level produces a wider confidence interval: Figure 1.Conclusion The width of the confidence interval is a function of the confidence level. Conclusion Larger values of standard confidence intervals deviation produce wider Figure 1. 54 .10: relationship between width and standard deviations Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged. INTERVAL ESTIMATION FOR TWO MEANS Previously… we have discussed the techniques to estimate parameters for one population mean Now. our interest will now be on the difference between two population means. consider this parameter but with two populations. 55 . With two populations. size: n2 Parameters: and Figure 1.INTERVAL ESTIMATION FOR TWO MEANS Population 1 Sample.11: Independent Population and Samples 56 . size: n1 Parameters: and Population 2 Statistics: and Sample. There are two (2) common forms of sample dependency. Independent Samples are samples that are completely unrelated to one another. matched-pairs studies in which similar people are surveyed at different points in time.INTERVAL ESTIMATION FOR TWO MEANS There are two different types of sample which are: Dependent Samples also called related (or paired) samples occur when the response of the nth person in the second sample is partly a function of the response of the nth person in the first sample. 57 . before-after and other studies in which the same people are surveyed at different points in time including panel studies. INTERVAL ESTIMATION FOR TWO MEANS Interval estimation for difference of two independent means In order to test and estimate the difference between two means. we draw random samples from each of two populations. Initially. samples that are completely unrelated to one another. Statistics used is or 58 . that is. we will consider independent samples. The populations from which the samples were obtained must be normally distributed. 59 .INTERVAL ESTIMATION FOR TWO MEANS Interval estimation for difference of two independent means Two assumptions need to be fulfilled in order to determine the difference between two independent means: The samples must be independent of each other. there can be no relationship between the subjects in each sample. that is. which are: i. Confidence interval when both population variance (or standard deviation) are known 60 .Interval estimation for difference of two independent means (cont.) There are four (4) different formulas to estimate the confidence level for the difference between two independent means.. Interval estimation for difference of two independent means (cont. Confidence interval when both population variance (or standard deviation) are unknown.. Confidence interval when both population variance (or standard deviation) are unknown but both sample sizes are more or equal to 30 iii. any one or both sample sizes less than 30 and both population variances are assume equal 61 .) ii. same situation should be consider in deciding the formula to use to determine the difference between two means 62 .Interval estimation for difference of two independent means (cont..) iv. any one or both sample sizes less than 30 and both population variances are assume unequal As in interval estimator for one mean. Confidence interval when both population variance (or standard deviation) are unknown. Figure 1.12: Flow diagram for choosing the correct distribution and Are both known? No Are both n1 & n2 > 30? No Use tα/2 values and s in the formula. ** variable must be approximately normally distributed Use zα/2 values no matter what the sample size is. ? Is No Use tα/2 values with Yes Use tα/2 values with pooled variance estimator. 63 . Yes * Variable must be normally distributed when n<30 Use zα/2 values and s in place of σ. Yes Conduct equal variances t-test. ? Is No Yes 64 .Figure 1. ** variable must be approximately normally distributed Yes * Variable must be normally distributed when n<30 Yes Conduct equal variances t-test.13: Flow diagrams for choosing the correct confidence interval formula and Are both known? No Are both n1 & n2 > 30? No Use tα/2 values and s in the formula. The following statistics regarding their scores in a final exam were obtained. 65 . Construct a 95% confidence interval for the difference between the means.Example 17: Two random samples of 40 students were drawn independently from two normal populations. The 95% confidence interval for the difference between the means is 66 . the means follow Normal distribution. However. since both sample sizes are large enough (both ).Solution The populations’ standard deviations are unknown. according to Central Limit Theorem. Example 18: A random sample of 22 male customers who shopped at this supermarket showed that they spent an average of RM80 with standard deviation of RM17. While a random sample of 20 female customers who shopped at the same supermarket showed that they spent an average of RM96 with standard deviation RM14. Construct a 99% confidence interval for the difference between the mean amount spent by all male and all female customers at this supermarket and interpret the interval.40. 67 . Assume that the amount spent at this supermarket by all the male and female customers are normally distributed with equal but unknown standard deviation.50. many chemical. Construct a 90% confidence interval for different of mean. Each employee takes a test.Example 19: Because of the rising costs of industrial accidents. which is graded out of a possible 25. Employees are encouraged to take these courses designed to heighten safety awareness. and manufacturing firms have instituted safety courses. A company is trying to decide which one of two courses to institute. Assume that the scores are normally distributed. To help make a decision eight employees take Course 1 and another eight take Course 2. The safety test results are shown below. mining. Course 1 Course 2 14 20 21 18 17 22 14 15 17 23 19 21 20 19 16 15 68 . Find a 95% confidence interval for the difference between the two population means.09. From past studies the population standard deviation for the children in Bandar B is assumed to be 1.01 hours and 2. respectively.Example 20: Random samples of children sent to kindergarten aged 4 to 6 years in Bandar A and B were taken to find the number of hours spend for outdoor activities in the kindergarten daily.01.88 hours. A sample of 321 children in Bandar B and 94 children in Bandar A give the mean of 3. 69 . while the population standard deviation for the children in Bandar A is 1. e.Interval estimation for the difference between two proportions We will now look at procedures for drawing inferences about the difference between populations whose data are nominal (i. With nominal data. Thus. the parameter to be estimated in this section is the difference between two population proportions: p1–p2. we can calculate the proportions of occurrences of each type of outcome. 70 . categorical). ii.Interval estimation for the difference between two proportions (cont…) Assumptions for doing Inferences about two proportions i. n2 p2 ≥ 5. for both sample the conditions ˆ ˆ ˆ ˆ n1 p1 ≥ 5. In order to use Normal Z distribution. n1 (1 − p1 ) ≥ 5 and n2 (1 − p2 ) ≥ 5 must be satisfied. 71 . We have proportions from two independent simple random samples. calculate the sample proportions and look at their difference.Interval estimation for the difference between two proportions (cont…) To draw inferences about the parameter . we take samples of population. 72 . x1 x2 ˆ ˆ and p 2 = p1 = n1 n2 ˆ ˆ ( p1 − p2 ) is an unbiased estimator for ( p1 − p2 ) . Interval estimation for the difference between two proportions (cont…) The confidence interval estimator for (p1–p2) is given by:  p1q1 p2q2   p1q1 p2q2  ˆˆ ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ ( p1 − p2 ) − zα  + ≤ p1 − p2 ≤ ( p1 − p2 ) + zα    n   n + n   n2  1 1 2  2 2 ˆ ˆ ( p1 − p2 ) ± zα ˆ ˆ ˆ ˆ p1q1 p2 q2 + n1 n2 2 73 . Version one (bright colors) was distributed in one supermarket. while version two (simple colors) was in another.Example 21: A Consumer Packaged Goods (CPG) company has testing the marketing of two new versions of soap packaging. Construct a 95% confidence interval for the difference between the two proportions of successes of packaged soap sales. 74 . Construct a 95% confidence interval estimate of the difference between the proportion of males and females who enjoy shopping. 224 answered yes.Example 22: A random sample of 500 respondents was selected in a large city to determine information concerning consumer behavior. Of 260 female respondents. 136 answered yes. Among the questions asked was. “Do you enjoy shopping?” Of 240 male respondents. 75 . SPSS NOTES FOR OBTAINING THE CONFIDENCE INTERVAL OF MEAN Step 1 : Select Analyze Menu → Select Descriptive Statistics 76 . Step 2 : Click on Explore → Select the appropriate variable Step 3 : Click on the button into Dependent List box List of Variable(s) Make your choice Make your choice 77 . click on Continue→ Click on OK 78 . eg: Descriptive You can change the degree of confidence (Usually use 90% and above) Step 5 : Then.Step 4 : Click on Statistics → Select the appropriate statistics. 12. If the times are normally distributed with a standard deviation of 5.Example A random sample of 10 university students was surveyed to determine the amount of time spent weekly using a personal computer. 14. 6. 8. 79 .2 hours. 15. and 3. 7. 10. estimate with 90% confidence the mean weekly time spent using a personal computer by all university students. The times are: 13. 5. 00 16.33 9. 80 .334 At 90% confidence level.396 Std.040 -1.68.300 Lower Bound Upper Bound . Error 1.30 6. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 9.92 11. the mean weekly time spent using a personal computer by all university students is between 6.Descriptives times Mean 90% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std.111 3 15 12 8 -.900 4.687 1.68 9.92 and 11. information regarding the sampling distribution of the statistics is important. To construct interval.SUMMARY A point estimator is a good estimator if it has the qualities of good estimator which are un-biasness. 81 . interval estimation involves an interval constructed around the point estimate with a probability of . Unlike point estimation. consistent and relatively efficient. If we want to know whether the population means/ proportion equals to certain value. we can conclude that there is evidence to conclude that the mean/ proportion equals to k.SUMMARY (CONT…) The Central Limit Theorem enables us to determine the sampling distribution for the sample statistics based on sample information of the sample size and knowledge of the population variance. k and the confidence interval for means/ proportion includes the k value. 82 . at a given level of confidence. SUMMARY (CONT…) If the confidence interval for the difference between two means/proportions includes 0 we can say that there is no significant difference (failed to reject) between the means of the two populations. END OF CHAPTER 1 83 . at a given level of confidence.
Copyright © 2024 DOKUMEN.SITE Inc.