Regression Analysis.pdf



Comments



Description

Regression Analysis: A Complete ExampleThis section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty Images A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums. Driving Experience (years) Monthly Auto Insurance Premium 5 2 12 9 15 6 25 16 $64 87 50 71 44 56 42 60 a. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables? b. Compute SSxx, SSyy, and SSxy. c. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c. e. Plot the scatter diagram and the regression line. f. Calculate r and r2 and explain what they mean. g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience. h. Compute the standard deviation of errors. i. Construct a 90% confidence interval for B. j. Test at the 5% significance level whether B is negative. k. Using α = .05, test whether ρ is different from zero. both the population correlation coefficient ρ and the population regression slope B are expected to be negative. Table 13. In other words.Solution a. Σy.5 Experience Premium x y xy x2 y2 5 2 12 9 15 6 25 16 Σx = 90 64 87 50 71 44 56 42 60 Σy = 474 320 174 639 336 25 4 81 36 4096 7569 2500 5041 1936 3136 1764 3600 600 144 660 225 1050 625 960 256 2 Σxy Σx Σy2 = = = 29. and he or she has to pay a higher premium for auto insurance.5 shows the calculation of Σx. Therefore. Consequently. we calculate a and b as follows: Thus. the insurance premium is a dependent variable and driving experience is an independent variable in the regression model. the insurance premium is expected to decrease with an increase in the years of driving experience. Table 13. A new driver is considered a high risk by the insurance companies. Σxy. Σx2. On average. we expect a negative relationship between these two variables.642 4739 1396 The values of x and y are The values of SSxy. and Σy2. To find the regression line. SSxx. our estimated regression line ŷ = a + bx is . b. Based on theory and intuition. we expect the insurance premium to depend on driving experience. and SSyy are computed as follows: c. e. that is. f. Figure 13. This result is consistent with the negative relationship we anticipated between driving experience and insurance premium. the monthly auto insurance premium decreases by $1.18.21 Scatter diagram and the regression line. for every extra year of driving experience. The standard deviation of errors is i. The value of a = 76.21 shows the scatter diagram and the regression line for the data on eight auto drivers.55. Note that when b is negative.6605 gives the value of ŷ for x = 0. on average. we should not attach much importance to this statement because the sample contains drivers with only two or more years of experience. we find the predicted value of y for x = 10 is Thus.59 states that 59% of the total variation in insurance premiums is explained by years of driving experience and 41% is not. it gives the monthly auto insurance premium for a driver with no driving experience. The low value of r2 indicates that there may be many other important variables that contribute to the determination of auto insurance premiums. Thus. The value of r2 = .5476 indicates that. we expect the monthly auto insurance premium of a driver with 10 years of driving experience to be $61. The value of b gives the change in ŷ due to a change of one unit in x. The values of r and r2 are computed as follows: The value of r = −. Figure 13. However. first we calculate the standard deviation of b: . g. To construct a 90% confidence interval for B. Note that the regression line slopes downward from left to right.77 indicates that the driving experience and the monthly auto insurance premium are negatively related. y decreases as x increases. The (linear) relationship is strong but not very strong. the premium is expected to depend on the driving record of a driver and the type and age of the car. Using the estimated regression line. h. b = −1. as mentioned earlier in this chapter. For example.d. Because σ is not known. we can state with 90% confidence that B lies in the interval −2. Determine the rejection and nonrejection regions.05. The null and alternative hypotheses are written as follows: Note that the null hypothesis can also be written as H0: B ≥ 0. we use the t distribution to make the hypothesis test.943. The 90% confidence interval for B is Thus. Figure 13. We perform the following five steps to test the hypothesis about B.57 for every extra year of driving experience.For a 90% confidence level. The significance level is . the critical value of t for . the t value for .52. on average.22.05 area in the right tail of the t distribution and 6 df is 1. Step 3.943. Step 2. Calculate the value of the test statistic. Make a decision.22 Step 4. j.57 to −. . the area in each tail of the t distribution is The degrees of freedom are From the t distribution table. The value of the test statistic t for b is calculated as follows: Step 5. The < sign in the alternative hypothesis indicates that it is a lefttailed test. That is. State the null and alternative hypotheses.05 area in the left tail of the t distribution and 6 df is −1. Select the distribution to use.52 and $2. Step 1. the monthly auto insurance premium of a driver decreases by an amount between $. as shown in Figure 13. From the t distribution table. 025 (the upper limit of the p-value range). But our test is left-tailed and the observed value of t is negative.23.01.025 and . we can state that for any α equal to or greater tha n . The value of the test statistic t for r is calculated as follows: .937. State the null and alternative hypotheses.01 < p-value<. t = −2. Using the p-Value to Make a Decision We can find the range for the p-value from the t distribution table (Table V of Appendix C) and make a decision by comparing that p-value with the significance level. Calculate the value of the test statistic. Hence.937 falls in the rejection region.23 Step 4. Step 3. Note that if we use technology to find this p-value. the critical values of t are −2.447 and −3.013.937 lies between −2.025 and . we will reject the null hypothesis. df = 6 and the observed value of t is −2. we will obtain a p-value of .447 and 2.937 is between 2.447 and 3.143. For this example. Step 1. The corresponding areas in the right tail of the t distribution are .025.447.013. Then we can reject the null hypothesis for any α > . The corresponding areas in the left tail of the t distribution are . For our example.025 Thus. We perform the following five steps to test the hypothesis about the linear correlation coefficient ρ.05. The null and alternative hypotheses are: Step 2. which is greater than the upper limit of the p-value of . Figure 13. From the alternative hypothesis we know that the test is two-tailed. From Table V (the t distribution table) in the row of df = 6. Select the distribution to use. The rejection and nonrejection regions for this test are shown in Figure 13.01. Therefore the range of the p-value is . 2. From the t distribution table. k. The significance level is 5%.The value of the test statistic t = −2. we reject the null hypothesis.143. Determine the rejection and nonrejection regions. Hence. That is. Thus. Assuming that variables x and y are normally distributed. we reject the null hypothesis and conclude that B is negative. Table V of Appendix C. the monthly auto insurance premium decreases with an increase in years of driving experience. α = . we will use the t distribution to perform this test about the linear correlation coefficient. As a result. 0 . we can state that for any α equal to or greater than .35. df = 6 and the observed value of t is −2.01.000 0. Regression Analysis: Premium y versus Experience x  The regression equation is Premium y = 76. For our example.956.26) 95% PI (34.0% R-Sq(adj) = 52.025 and.94 P 0.5476 SE Coef 6.5 639.5 MS 918.Step 5. For this example. which is equal to the upper limit of the p-value.02) Values of Predictors for New Observations New Obs 1 Experience x 10. 70.05 (the upper limit of the p-value range). press F1 for help.026 R-Sq = 59.55 Experience x Predictor Constant Experience x S = 10.5270 T 11.01 -2. we reject the null hypothesis and conclude that the linear correlation coefficient between driving experience and auto insurance premium is different from zero. As a result. α = . Since the test is two tailed.961 0. the range of the p-value is Thus. we will reject the null hypothesis. The value of the test statistic t = −2.0 1557.05. t = 2.3199 Coef 76.1.5 F 8.143. 88.62 P 0. Welcome to Minitab.18 SE Fit 3.447 and 3. Using the p-Value to Make a Decision We can find the range for the p-value from the t distribution table and make a decision by comparing that p-value with the significance level. Hence.026 Predicted Values for New Observations New Obs 1 Fit 61.7 .956 falls in the rejection region.11.1% Analysis of Variance Source Regression Residual Error Total DF 1 6 7 SS 918. we reject the null hypothesis. Make a decision. From Table V (the t distribution table) in the row of df = 6.5 106. The corresponding areas in the right tail of the t distribution curve are .71 95% CI (52.956 is between 2.660 -1. 3550 h. do not reject H0 j. Construct a 95% confidence interval for B. Calculate r and r2 and explain what they mean. d. test statistic: t = 1. b. r = . critical value: t = 2. −. and SSxy. SSyy.53 to 1.271. j. critical value: t = 1.4000. Predict the cholesterol level of a 60-year-old man. Age 58 69 43 39 63 52 47 31 74 36 Cholesterol level 189 235 193 177 154 191 213 165 198 181 a. SSxx = 1895. Using α = . SSxy = 1231.265. g.8000 b. can you conclude that the linear correlation coefficient is positive? Answer: a. Taking age as an independent variable and cholesterol level as a dependent variable. Plot the scatter diagram and the regression line. h.83 i.025. Find the regression of cholesterol level on age. c. do not reject H0 .306. SSyy = 4798. Compute the standard deviation of errors. H1: B > 0. se = 22. test statistic: t = 1. e. ŷ = 156.6498x d.The following table gives information on ages and cholesterol levels for a random sample of 10 men. f. i. compute SSxx.860. 195.41. Test at the 5% significance level if B is positive.17 f.3302 + . H0: B = 0.3182 g. r2 = . Briefly explain the meaning of the values of a and b calculated in part b.6000. H1: ρ > 0. H0: ρ = 0.
Copyright © 2024 DOKUMEN.SITE Inc.