Aakash BhatiaMBA535 (Analytical Tools for Decision-Making) Marist College Week 8 Assessment Question 1: The managing partner of an advertising agency believes that his company's sales are related to the industry sales. He uses Microsoft Excel's Data Analysis tool to analyze the last 4 years of quarterly data (i.e., n = 16) with the following results: Regression Statistics Multiple R R Square Adjusted R Square Standard Error SYX Observations 0.802 0.643 0.618 0.9224 16 ANOVA df Regression 1 Error 14 Total 15 SS 21.497 11.912 33.409 MS 21.497 0.851 Predictor Coef StdError t Stat Intercept 3.962 1.440 2.75 Industry 0.040451 0.008048 5.03 Durbin-Watson Statistic F 25.27 Sig.F 0.000 P-value 0.016 0.000 1.59 a) What is the value of the quantity that the least squares regression line minimizes? Explain your answer. A regression line (LSRL - Least Squares Regression Line) is a straight line that describes how a response variable y changes as an explanatory variable x changes. Error is defined as observed value - predicted value and we are seeking a line that minimizes the sum of these distances. Specifically, the least squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Total sum of squares = Regression sum of squares + Error sum of squares 33.409=21.497+E Thus, E=33.409-21.497 =11.912 b) What is the prediction of Y for a quarter in which X = 120? Show how you obtain your answer. R(g) = E(Y − g(X))2 The coefficient of determination is the ratio of the explained variation to the total variation. Condition on X=x and let r(x) = E(Y |X = x) = yf (y|x)dy be the regression function.where E is the expected value with respect to the joint distribution f (x. Then. y).802 (given in question) . Thus Sqrt(0.643) =0. E( ) = E[E[Y − r(X)|X = x] = 0 and we can write Y = r(X) + e Thus putting the values in the equation we get 8. r2 0.643 (given in question) d) What is the value of the correlation coefficient? Measure of the direction and strength of the linear association between Y and X formula for computing r is: where n is the number of pairs of data. Let = Y − r(X).816 c) What is the value for the coefficient of determination? Proportion of variation in Y “explained” by the regression on X. 3574 -0.94942E-07/2 (t test on slope) d) Which of the following will be a correct conclusion? Explain your answer. the appropriate null and alternative hypotheses are. therefore. conclude that there is not sufficient evidence to show that the prisons stock portfolio and S&P 500 index are negatively related.0186 (as given in the table) c) To test whether the prison stocks portfolio is negatively related to the S&P 500 index. B) We can reject the null hypothesis and. H1 : ρ > 0 C) H0 : r ≥ 0 vs.8660 0. what is the measured value of the test statistic? -7. conclude that there is sufficient evidence to show that the prisons stock portfolio and S&P 500 index are negatively related.0186 P-value 8. then it is possible to reduce the variability of the portfolio's return. In other words. D) We can reject the null hypothesis and conclude that there is not sufficient .6136 -7. conclude that there is sufficient evidence to show that the prisons stock portfolio and S&P 500 index are negatively related.5025 0. what is the p-value of the associated test statistic? 2.Question 2: An investment specialist claims that if one holds a portfolio that moves in the opposite direction to the market index like the S&P 500. respectively. H1 : ρ < 0 b) To test whether the prison stocks portfolio is negatively related to the S&P 500 index. A) H0 : ρ ≥ 0 vs.0716 T Stat 13. A) We cannot reject the null hypothesis and. which are believed to be negatively related to the S&P 500 index. A regression analysis was performed by regressing the returns of the prison stocks portfolio (Y) on the returns of S&P 500 index (X) to prove that the prison stocks portfolio is negatively related to the S&P 500 index at a 5% level of significance.94942E-07 a) To test whether the prison stocks portfolio is negatively related to the S&P 500 index. Intercept S&P Coefficients Standard Error 4. is collected. A sample of 26 years of S&P 500 index and a portfolio consisting of stocks of private prisons. one can create a portfolio with positive returns but less exposure to risk. H1 : ρ < 0 B) H0 : ρ ≤ 0 vs. H1 : r < 0 D) H0 : r ≤ 0 vs. H1 : r > 0 Ans A) H0 : ρ ≥ 0 vs.7932E-13 2. therefore. C) We cannot reject the null hypothesis and. therefore. The results are given in the following EXCEL output. 74% of the total variation of ACT scores can be explained by GPA.5774 Adjusted R Square 0. b) What is the value of the measured test statistic to test whether there is any linear relationship between GPA and ACT? . which proves the point Ans B) We can reject the null hypothesis and. Question 3: It is believed that GPA (grade point average.5940 0.1986 0. H1 : ρ < 0. Regressing GPA on ACT Regression Statistics Multiple R 0.5069 Standard Error 0. conclude that there is sufficient evidence to show that the prisons stock portfolio and S&P 500 index are negatively related.8398 2.5940 0.4347 1. based on a four point scale) should have a positive linear relationship with ACT scores. As also given in the table the coefficient and the T stat values are negative. Also the p value is given as 2. As given in the question r2=0.7036 2.5774 thus the answer is B) ACT scores account for 57.0148 0.0287 Coefficients Standard Error 0.74% of the variability of ACT scores.7598 R Square 0.94942E-07.0356 MS 0.74% of the total fluctuation in GPA.6119 0.0286 0.evidence to show that the prisons stock portfolio and S&P 500 index are negatively related. As we calculate that H0 : ρ ≥ 0 vs. Given below is the Excel output from regressing GPA on ACT scores using a data set of 8 randomly chosen students from a Big-Ten university. D) None of the above.5630 -1.2691 Observations 8 ANOVA df Regression Residual Total Intercept ACT SS 1 6 7 0.74% of the total fluctuation in GPA.5681 0.1895 a) The interpretation of the coefficient of determination in this regression is A) 57.0724 F Significance F 8.9284 0. we reject the null and conclude that the alternative is probably true.8633 0.0286 t Stat P-value Lower 95% Upper 95% 0. C) GPA accounts for 57.we find sufficient evidence. Thus they are negatively related. therefore. B) ACT scores account for 57.1021 0. 2.5681 + .8633 c) What is the predicted average value of GPA when ACT = 20? Show how you obtain your answer. As The p-value is above the significance level. . . so do not reject the null hypothesis.1021(20) = 2. Thus the answer is Do not reject the null hypothesis. there is not sufficient evidence to show that ACT scores and GPA are linearly related.6101 Thus.61 d) What are the decision and conclusion on testing whether there is any linear relationship at 1% level of significance between GPA and ACT scores? Explain your answer. 2. hence. 9795 b) What is the value of the measured t-test statistic to test whether average SALARY depends linearly on HOURS? 13. Given below is the Excel output from regressing starting salary on number of hours spent studying per day for a sample of 51 students.4018 0.Question 4: It is believed that.9795 0.49(50-1) .3561 c) What is the p-value of the measured F-test statistic to test whether HOURS affects SALARY? 5.3561 5.0472 335.7134 2.8940 0.944E-18 d) What are the degrees of freedom for testing whether HOURS affects SALARY? 1. Regression Statistics Multiple R 0. the average numbers of hours spent studying per day (HOURS) during undergraduate education should have a positive linear relationship with the starting salary (SALARY.3859 t Stat P-value Lower 95% Upper 95% -4.0473 1.1269 a) What is the estimated average change in salary (in thousands of dollars) as a result of spending an extra hour per day studying? 0.8857 R Square 0.7801 Standard Error 1.0798 Standard Coefficients Error -1.8321 1.7845 Adjusted R Square 0.8782 427.3704 Observations 51 ANOVA df Regression Residual Total Intercept Hours 1 50 SS MS 335.0865 13.0733 F Significance F 178.944E-18 0.7015 -1. measured in thousands of dollars per month) after graduation.051E-05 -2. Note: Some of the numbers in the output are purposely erased. Explain your reasoning.An appropriate test to use is the t-test on the population corelaiton coefficient.=0. These values are defined for hours. -1. 1. 427. thus we do not have enough evidence to reject the null hypothesis that the population means are all equal .But here r2 is small.944E-18/2 There is a zero population correlation coefficient between a pair of random variables. then the populations are different regardless of what ANOVA concludes about differences among the means.e) What is the error sum of squares (SSE) of the above regression? Show how you obtain your answer. 1. the reliability of an interval containing the actual mean decreases (less of a range to possibly cover the mean). This may be the most important conclusion from the experiment.70159.8321927.8321927. B) narrower than [-2.08654]. means that there is no linear relationship between the random variables.12697]. A large value of r2 means that a large fraction of the variation is due to the treatment that defines the groups. Ans. Thus when somebody will spend an extra hour per day studying. 1.0325465 =92.70159. This occurs because the as the precision of the confidence interval increases (ie CI width decreasing). D) narrower than [0.1269. and in the table the hours for confidence interval 95% is 0. 1.8321 and for upper 95% is 1.12697] g) To test the claim that average SALARY depends positively on HOURS against the null hypothesis that average SALARY does not depend linearly on HOURS. The 90% confidence interval for the average change in SALARY (in thousands of dollars) will surely be narrower than [0.7845 p-value is greater than the significance level. Since the confidence interval is 90% which less that 95%.8321927.0798-335. D) narrower than [0.0326 f) The 90% confidence interval for the average change in SALARY (in thousands of dollars) as a result of spending an extra hour per day studying is A) wider than [-2. -1.8321927.12697] 90% Confidence Interval would be narrower than a 95% Confidence Interval. standard deviations are truly different. what is the pvalue of the test statistic? What are the results of the test? Explain your answer. 5. C) wider than [0.08654].12697].0472 92. Question 5: The management of a chain electronic store would like to develop a model for predicting the weekly sales (in thousands of dollars) for individual stores based on the number of customers who made purchases.05 8. All values calculated using excel and formula:- .21 9. What are the values of the estimated intercept and slope? Show how you obtain your answer.42 10.53 7.25 a) Estimate a linear regression.21 9.08 6. A random sample of 12 stores yields the following results: Customer s 907 926 713 741 780 898 510 529 460 872 650 603 Sales (Thousands of Dollars) 11.73 7.20 11.52 7.12 9.02 6. Regression Formula : Regression Equation(y) = a + bx Slope(b) = (NΣXY .(ΣX)(ΣY)) / (NΣX2 .b(ΣX)) / N Where. b = The slope of the regression line a = The intercept point of the regression line and the y axis.(ΣX)2) Intercept(a) = (ΣY . x and y are the variables. N = Number of values or elements X = First Score Y = Second Score ΣXY = Sum of the product of first and Second Scores ΣX = Sum of First Scores ΣY = Sum of Second Scores ΣX2 = Sum of square First Scores . ( ∑ X )2 √ N x ( ∑ Y2 .01001=Slope(b) 14464.7.603 Y Values:-11.02.21.650.8.10.926.( ∑ X ) ( ∑ Y ) / √ N x ( ∑ X2 .21.05.8.9.713.7.53.42. where b0 is the yintercept and b1 is the slope.872.25 0.53.4191 e) Which of the following is the correct null hypothesis for testing whether the number of customers who make purchases affects weekly sales? A) H0 : β0 = 0 B) H0 : β1 = 0 C) H0 : μ = 0 D) H0 : ρ = 0 Ans B) H0 : β1 = 0 f) What is the value of the t test statistic when testing whether the number of customers who make purchases affects weekly sales? = 27530.898.6.460.( ∑ Y )2 0.6.21.52.10.02.898.7.460.741.20.713.446+0.9.9453 c) What is the value of the coefficient of correlation? X values:-907.741.6.9.01x b) What is the value of the coefficient of determination? Coefficient of Determination ( r2 ) = r x r.11.20.529.650.510.this relationship can be represented by the equation y = b0 + b1x.12.0100 Regression Equation(y): 1.52.780.7.7.9723 d) What is the value of the standard error of the estimate? The formula for the standard error of the estimate is: 0.5682 .780.529. X values:-907.7.603 Y Values:-11.11.73.9.42. thus putting the values in the formula and calculating.12.9.510.we get 0.21.0.6.73.9.08.25 Thus putting the values in the equation:Correlation Coefficient ( r ) = N x ∑ XY .872.926.05.08. X1=715.898 =13. we get df 10 h) Construct a 95% confidence interval for the change in average weekly sales when the number of customers who make purchases increases by one. T table used:- .6117 =2. Show how you obtain your answer.9187 =47.75-8.898 Thus putting these values in the formula:- =715.75 X2=8.1464 g) What are the degrees of freedom of the t test statistic when testing whether the number of customers who make purchases affects weekly sales? Using formula and values from previous parts.6117 47. 0117 generated from Excel using the TINV function . x = Mean σ = Standard Deviation α = 1 .Formula: If (n>=30).0083 to 0. CI = x ± tα/2 × (σ/√n) Where.(Confidence Level/100) Zα/2 = Z-table value tα/2 = t-table value CI = Confidence Interval and looking for t values 5% confidence interval for the change in average weekly sales when the number of customers who make purchases increases by one 0. CI = x ± Zα/2 × (σ/√n) If (n<30). (Confidence Level/100) Zα/2 = Z-table value tα/2 = t-table value CI = Confidence Interval and looking for t values 95% confidence interval for the average weekly sales when the number of customers who make purchases is 600 7. Show how you obtain your answer. . CI = x ± Zα/2 × (σ/√n) If (n<30). CI = x ± tα/2 × (σ/√n) Where. Formula: If (n>=30).1194 to 7.i) Construct a 95% confidence interval for the average weekly sales when the number of customers who make purchases is 600.7864 thousands of dollars generated from Excel using the TINV function j) Construct a 95% prediction interval for the weekly sales of a store that has 600 purchasing customers. Show how you obtain your answer. x = Mean σ = Standard Deviation α = 1 . (Confidence Level/100) Zα/2 = Z-table value tα/2 = t-table value CI = Confidence Interval and looking for t values 95% prediction interval for the weekly sales of a store that has 600 purchasing customers 6.Formula: If (n>=30). CI = x ± Zα/2 × (σ/√n) If (n<30).4614 to 8. x = Mean σ = Standard Deviation α = 1 .4444 thousands of dollars generated from Excel using the TINV function . CI = x ± tα/2 × (σ/√n) Where.