Post Hoc Power



Comments



Description

UNDERSTANDING STATISTICS, 3(4), 201–230 Copyright © 2004, Lawrence Erlbaum Associates, Inc.Post Hoc Power: A Concept Whose Time Has Come Anthony J. Onwuegbuzie Department of Educational Measurement and Research University of South Florida Nancy L. Leech Department of Educational Psychology University of Colorado at Denver and Health Sciences This article advocates the use of post hoc power analyses. First, reasons for the nonuse of a priori power analyses are presented. Next, post hoc power is defined and its utility delineated. Third, a step-by-step guide is provided for conducting post hoc power analyses. Fourth, a heuristic example is provided to illustrate how post hoc power can help to rule in/out rival explanations in the presence of statistically nonsignificant findings. Finally, several methods are outlined that describe how post hoc power analyses can be used to improve the design of independent replications. power, a prior power, post hoc power, significance testing, effect size For more than 75 years, null hypothesis significance testing (NHST) has dominated the quantitative paradigm stemming from the seminal works of Fisher (1925/1941) and Neyman and Pearson (1928). NHST was designed to provide a means of ruling out a chance finding, thereby reducing the chance of falsely rejecting the null hypothesis in favor of the alternative hypothesis (i.e., committing a Type I error). Although NHST has permeated the behavioral and social science field since its inception, its practice has been subjected to severe criticism with the number of critics growing throughout the years. The most consistent criticism of NHST that has emerged is that statistical significance is not synonymous with practical importance. More specifically, statistical significance does not provide any informaRequests for reprints should be sent to Anthony J. Onwuegbuzie, Department of Educational Measurement and Research, College of Education, University of South Florida, 4202 East Fowler Avenue, EDU 162, Tampa, FL 33620–7750. E-mail: [email protected] 202 ONWUEGBUZIE AND LEECH tion about how important or meaningful an observed finding is (e.g., Bakan, 1966; Cahan, 2000; Carver, 1978, 1993; Cohen, 1994, 1997; Guttman, 1985; Loftus, 1996; Meehl, 1967, 1978; Nix & Barnette, 1998; Onwuegbuzie & Daniel, 2003; Rozeboom, 1960; Schmidt, 1992; 1996; Schmidt & Hunter, 1997). As a result of this limitation of NHST, some researchers (e.g., Carver, 1993) contend that effect sizes, which represent measures of practical importance, should replace statistical significance testing completely. However, reporting and interpreting only effect sizes could lead to the overinterpretation of a finding (Onwuegbuzie & Levin, 2003; Onwuegbuzie, Levin, & Leech, 2003). As noted by Robinson and Levin (1997) Although effect sizes speak loads about the magnitude of a difference or relationship, they are, in and of themselves, silent with respect to the probability that the estimated difference or relationship is due to chance (sampling error). Permitting authors to promote and publish seemingly “interesting” or “unusual” outcomes when it can be documented that such outcomes are not really that unusual would open the publication floodgates to chance occurrences and other strange phenomena. (p. 25) Moreover, interpreting effect sizes in the absence of a NHST could adversely affect conclusion validity just as interpreting p values without interpreting effect sizes could lead to conclusion invalidity (Onwuegbuzie, 2001). Table 1 illustrates the four possible outcomes and their conclusion validity when combining p values and effect sizes. This table echoes the decision table that demonstrates the relationship between Type I error and Type II error. It can be seen from Table 1 that conclusion validity occurs when (a) both statistical significance (e.g., p < .05) and a large effect size prevail or (b) both statistical nonsignificance (e.g., p > .05) and a small effect size occur. Conversely, conclusion invalidity exists if (a) statistical nonsignificance (e.g., p > .05) is combined with a large effect size (i.e., Type A error) and (b) statistical significance (e.g., p < .05) is combined with a small effect size (i.e., Type B error). Both Type A and Type B errors suggest that any declaration that the observed finding is meaningful on the part of the researcher would be misleading and hence result in conclusion invalidity (Onwuegbuzie, 2001). In this respect, conclusion validity is similar to what Levin and Robinson (2000) termed “conclusion coherence.” Levin and Robinson (2000) defined conclusion coherence as “consistency between the hypothesis-testing and estimation phases of the decision-oriented empirical study” (p. 35). Therefore, as can be seen in Table 1, always interpreting effect sizes regardless of whether statistical significance is observed may lead to a Type A or Type B error being committed. Similarly, Type A and Type B errors also prevail when NHST is used by itself. In fact, conclusion validity is maximized only if both p values and effect sizes are utilized. In an attempt to maximize conclusion validity (i.e., to reduce the probability of Type A and Type B error), Robinson and Levin (1997) proposed what they termed POST HOC POWER 203 TABLE 1 Possible Outcomes and Conclusion Validity When Null Hypothesis Significance Testing and Effect Sizes Are Combined Effect Size Large Not statistically significant p value Statistically significant Type A error Conclusion validity Small Conclusion validity Type B error a “two-step process” for making statistical inferences. According to this model, a statistically significant observed finding is followed by the reporting and interpreting of one or more indexes of practical importance; however, no effect sizes are reported in light of a statistically nonsignificant finding. In other words, analysts should determine first whether the observed result is statistically significant (Step 1), and if and only if statistical significance is found, then they should report how large or important the observed finding is (Step 2). In this way, the statistical significance test in Step 1 serves as a “gatekeeper” for the reporting and interpreting of effect sizes in Step 2. This two-step process is indirectly endorsed by the latest edition of the Publication Manual of the American Psychological Association (American Psychological Association [APA], 2001): When reporting inferential statistics (e.g., t tests, F tests, and chi-square), include [italics added] information about the obtained magnitude or value of the test statistic, the degrees of freedom, the probability of obtaining a value as extreme as or more extreme than the one obtained, and the direction of the effect. Be sure to include [italics added] sufficient descriptive statistics (e.g., per-cell sample size, means, correlations, standard deviations) so that the nature of the effect being reported can be understood by the reader and for future meta-analyses. (p. 22) Three pages later, the APA (2001) stated Neither of the two types of probability value directly reflects the magnitude of an effect or the strength of a relationship. For the reader to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relationship in your Results section. (p. 25) On the following page, APA (2001) stated that The general principle to be followed, however, is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (p. 26) we provide a step-by-step guide for conducting post hoc power analyses. such analyses are rarely employed (Cohen. we provide a heuristic example to illustrate how post hoc power can help to rule in/out rival explanations in the presence of statistically nonsignificant findings. Clearly. However. may lead to the nonreporting of a nontrivial effect (i. if a 5% level of significance is designated. For example. which involves testing the trend of the set of hypotheses at the third step. Type A error). However. This is because if statistical power is lacking. Next. 1992. we outline several methods that describe how post hoc power analyses can be used to improve the design of independent replications. then the first step of the two-step method and the first and third steps of the three-step procedure. Third. This would help researchers determine whether low power threatens the internal validity of findings (i. researchers should conduct post hoc power analyses. then the Type I error rate is 5%. Fourth. 2001). especially for nonstatistically significant findings.e. As noted previously. which serve as gatekeepers for computing effect sizes. Thus. Keselman et al. namely. DEFINITION OF STATISTICAL POWER Neyman and Pearson (1933a. 1998. both error probabilities can be reduced if researchers conduct a priori power analyses to select appropriate sample sizes. the reduction in the probability of Type A error is not guaranteed using these procedures. declaring as important a statistically significant finding with a small effect size (Onwuegbuzie. 2002). Type A error). in this article. First. whereas Type B error almost certainly will be reduced by using one of these methods compared to using NHST alone. Finally. we advocate the use of post hoc power analyses.. sample sizes that are too small increase the probability of a Type II error (not rejecting a false null hypothesis) and subsequently increase the probably of committing a Type A error. Simply put. Onwuegbuzie and Levin (2003) proposed a three-step procedure when two or more hypothesis tests are conducted within the same study. we present reasons for the nonuse of a priori power analyses. Stated another way. α represents the conditional probability of making a Type I error when the null hypothesis is true. Neymann and Pearson (1928) defined α as the long-run relative frequency by which Type I errors .204 ONWUEGBUZIE AND LEECH Most recently. Using either the two-step method or the three-step method helps to reduce not only the probability of committing a Type A error but also the probability of committing a Type B error. Onwuegbuzie. the Type I error probability is determined by the significance level (α). 1933b) were the first to discuss the concepts of Type I and Type II error. When a priori power analyses have been omitted. researchers typically have not used this technique.. Yet..e. Type I error occurs when the researcher rejects the null hypothesis when it is true. unfortunately. we define post hoc power and delineate its utility. and sample size.1 The first factor is level of significance. However. Power.e.POST HOC POWER 205 are made over repeated samples from the same population under the same null and alternative hypothesis. effect size. The conditional probability of making a Type II error under the alternative hypothesis is denoted by β. 4). Onwuegbuzie and Daniel showed that the higher the score reliability of measures used. the greater the statistical power. Cohen (1965).20). i. the ratio was 1:4. 1 – .e.. the probability that it will result in the conclusion that the phenomenon exists” (p. As noted by Cohen (1988). see Onwuegbuzie and Daniel (2002). Statistical power estimates are affected by three factors. Thus. level of significance. Specifically. the larger the difference between the value of the parameter under the null hypothesis and the parameter under the alternative parameter. 1992). 1965. Conversely. there are four possible types of power analyses in which one of the parameters is determined as a function of the other three as follows: (a) power as a function of level of significance. Cohen. Holding all other aspects constant. Power can be viewed as how likely it is that the researcher will find a relationship or difference that really prevails. The larger the sample size. 2Power also depends on the alternative type. 5%) to the probability of committing a Type II error (i. This recommendation was based on considering the ratio of the probability of committing a Type I error (i.80 = . who defined the power of a statistical test as “the probability [assuming the null hypothesis is false] that it will lead to the rejection of the null hypothesis. Type II error occurs when the analyst accepts the null hypothesis when the alternative hypothesis is true. and sample size are related such that any one of these components is a function of the other three components. caution is needed due to the fact that large sample sizes can increase power to the point of rejecting the null hypothesis when the associated observed finding is not practically important. 1976. The most common definition of power comes from Cohen (1988). increasing the level of significance increases power but also increases the probability of rejecting the null hypothesis when it is true. the greater the likelihood of rejecting the null hypothesis2 (Chase & Tucker.. effect size..e. The second influential factor is the effect size. accepting the alternative hypothesis) when the alternative hypothesis is true. Also. Statistical power is the conditional probability of rejecting the null hypothesis (i. 14). 1988. . the fourth is completely determined” (p.80 or greater for correctly rejecting the null hypothesis representing a medium effect at the 5% level of significance. the greater the power to detect it. Specifically. In this case. recommended a probability of . whether it is one-tailed or two-tailed. It is given by 1 – β.e. 1Recently. The third instrumental component is the sample size. 1969. reflecting the contention that Type I errors are generally more serious than are Type II errors. “when any three of them are fixed.. assuming the null hypothesis is true. in accordance with McNemar (1960). (b) effect size as a function Onwuegbuzie and Daniel (2004) demonstrated empirically that power also is affected by score reliability. In this seminal work. 1983). Halpin & Easterday. (c) level of significance as a function of sample size. and power (Cohen.80 and the level of significance at . researchers put themselves in the position to select a sample size that is large enough to lead to a rejection of the null hypothesis for a given effect size. Thus. and . Alternatively stated. low statistical power reduces the probability of rejecting the null hypothesis and therefore increases the probability of committing a Type II error (Bakan. It may also increase the probability of committing a Type I error (Overall. Failing to consider statistical power can have dire consequences for researchers. the optimum time to conduct a power analysis is during the research design phase (Wooley & Dawson. 1992).18.088 major statistical tests were .05. most researchers set the power coefficient at . hypothesized power) across the 70 selected studies for nine frequently used statistical tests using small. During the next three decades after Cohen’s (1962) investigation. a priori power analyses help researchers to obtain the necessary sample sizes to reach a decision with adequate power.48 for medium effects indicated that studies in the abnormal psychology field had. which is called an a priori power analysis. First and foremost. medium. 1966. The average hypothesized statistical power of . Conventionally. educational measurement (Brewer & . 1972. The value of an a priori power analysis is that it helps the researcher in planning research studies (Sherron. By conducting such an analysis.e. 1976). 1969). less than a 50% chance of correctly rejecting the null hypothesis (Brewer. and prevent potentially important studies from being published as a result of publication bias (Greenwald.. Cohen. Indeed. 1972). 1975) that has been called the “file-drawer problem” representing the tendency to keep statistically nonsignificant results in file drawers (Rosenthal. once the expected effect size and type of analysis are specified. Cohen assessed the power of studies published in the abnormal-social psychology literature. on average. yield misleading results in power studies (Chase & Tucker. It has been exactly 40 years since Jacob Cohen (1962) conducted the first survey of power. Using the reported sample size and a nondirectional significance level of 5%. effect size. . helps the researcher to ascertain the sample size necessary to obtain a desired level of power for a specified effect size and level of significance. and large effect size.206 ONWUEGBUZIE AND LEECH of level of significance. 1999). 1965. 1979). and large estimated effect size values. Cohen calculated the average power to detect a hypothesized effect (i. 1988). respectively. medium. 1988). The latter type of power analysis is the most popular and most useful for planning research studies (Cohen. and power.48.83 for detecting a small. and (d) sample size as a function of level of significance. This form of power analysis. and power. 1988). 1976). sample size. then the sample size needed to meet all specifications can be determined. educational research (Brewer. effect size. several researchers have conducted hypothetical power surveys across a myriad of disciplines including the following: applied and abnormal psychology (Chase & Chase. The average power of the 2. & Solomon.g. tables have been provided by Cohen (1988. and large effects. 1986). respectively.5 indicates that more than one half of all statistical tests in the social and behavioral science literature will be statistically nonsignificant. Onwuegbuzie. Haase.. Waechter. Indeed. an average hypothetical power of . 1999). 1965. Cohen. Similarly disturbing is the average hypothesized power of . Sedlmeier & Gigerenzer. 1972. communication (Chase & Tucker. Yet. counselor education (Haase. Cooper & Findley. 1965. English education (Daly & Hexamer.24. χ2). science education (Penick & Brewer. 1980). marketing research (Sawyer & Ball. 1988. 1988) even though statistical power has been promoted actively since the 1960s (Cohen.50 as a rough average” (p.63 across these studies is disturbing. . As noted by Schmidt and Hunter (1997).63. as declared by Rossi (1997). Keselman et al. the average power of . 1996. the fact that power is unacceptably low in most studies suggests that misuse of NHST is to blame. 1974). 24). This lack of power analyses still prevails despite the recommendations of the APA (2001) to take power “seriously” and to “provide evidence that your study has sufficient power to detect effects of substantive interest” (p.500 journal articles and 40. 1998. and mathematics education (Halpin & Easterday. 1982). F. 1973). 1983). 40). Unfortunately. 2002. 1965.. 1976. Sherron. 1975). 178). 1981). & Urry. 1973). Assuming that a medium effect size is appropriate for use in most studies because of its combination of being practically meaningful and realistic (Cohen. not the logic of NHST. Katzer & Sodt. Schmidt. 1992. it is extremely surprising that very few researchers conduct and report power analyses for their studies (Brewer. social work education (Orme & Tolman. Even when a priori power has been calculated. mass communication (Chase & Baran. . it is possible that “at least some controversies in the social and behavioral sciences may be artifactual in nature” (p.40 to . Hunter. Bearing in mind the importance of conducting statistical power analyses.000 statistical tests. it is rarely reported (Wooley & Dawson. communication disorders (Kroll & Chase.64 for a medium effect reported by Rossi (1990) across 25 power surveys involving more than 1. 40). 1976). Thus. r. and . it can be argued that low statistical power represents more of a research design issue than it is a statistical issue because it can be rectified by using a larger sample.60 range (Cohen. 1962.POST HOC POWER 207 Owen. 1992.85 for small. gerontology (Levenson. z. 1983). 1983). 1992) to determine the necessary sample size. 1989) … [with] . “This level of accuracy is so low that it could be achieved just by flipping a (unbiased) coin!” (p. 1982. 1988. 1975. Wooley & Dawson. Moreover. the publication bias that prevails in research suggests that the hypothetical power estimates provided previously likely represent an upper bound. Schmidt. 1969) and even though for many types of statistical analyses (e. An even more alarming picture is painted by Schmidt and Hunter (1997) who reported that “the average [hypothesized] power of null hypothesis significance tests in typical studies and research literature is in the . medium. 1962. 1972. 1962. The average hypothetical power of these 15 studies was . 1965. evidence exists that statistical power is not sufficiently understood by researchers (Cohen. and Moore (1998) reported that the issue of statistical power is regarded by instructors of research methodology. Harris (1997) also provided an additional rationale for the lack of power analyses: I suspect that this low rate of use of power analysis is largely due to the lack of proportionality between the effort required to learn and execute power analyses (e. Second. these students will not be suitably equipped to conduct such analyses. 2001). when power is taught. If calculation of the sample size needed for adequate power and for choosing between alternative interpretations of a nonsignificant result could be made more nearly equal in difficulty to the effort we’ve grown accustomed to putting into significance testing itself. Morse.. 1992). & Buchner.208 ONWUEGBUZIE AND LEECH The lack of use of power analysis might be the result of one or more of the following factors. although extremely useful. In any case. Also in the Mundfrom et al.and graduate-level statistical courses. 165) A further reason why a priori power analyses are not conducted likely stems from the fact that the most commonly used statistical packages. First and foremost.g. Faul. it appears that the concept and applications of power are not taught in many undergraduate. statistics. researchers are forced to use at least two types of statistical software to conduct quantitative research studies. Disturbingly. 2002). although APA (2001) stipulated that power analyses be conducted despite providing several NHST examples. Shaw.. Moreover. the statistical software programs that conduct power analyses (e... 1977) and the low payoff from such an analysis (e. Another reason for the spasmodic use of statistical power possibly stems from the incongruency between endorsement and practice. it is likely that inadequate coverage is given. the high probability that resource constraints will force you to settle for a lower N than your power analysis says you should have)—especially given the uncertainties involved in a priori estimates of effect sizes and standard deviations. more of us might in fact carry out these preliminary and supplementary analyses. which render the resulting power calculation rather suspect. Erdfelder. such as the Statistical Package for the Social Sciences (SPSS. and measurement as being only the 34th most important topic in their fields out of the 39 topics presented. study. Thomas. Furthermore. then students similarly will not take it seriously. Mundfrom. 1988. typically do not conduct other types of analyses. 2001) and the Statistical Analysis System (SAS Institute Inc. SPSS Inc. For instance. and thus. (p. dealing with noncentral distributions or learning the appropriate effect-size measure with which to enter the power tables in a given chapter of Cohen. 2002).g.g. Young. 1996. Clearly.. power received the same low ranking with respect to coverage in the instructors’ classes. the manual does not provide any examples of how to report statistical power (Fidler. if power is not being given a high status in quantitative-based research courses. do not allow researchers directly to conduct power analyses. which is both inconvenient and possibly expen- . POST HOC POWER ANALYSES Whether an a priori power analysis is undertaken and reported. (Indeed. & Harshman.g. That is. These analyses should be reported in the Method section of research reports.. 1988). One problem that commonly occurs in educational research is when the study is completed and a statistically nonsignificant result is found. Cohen. 2003). variance) serves as an additional impediment to a priori power analyses. As such.e. Unfortunately. Even when researchers have power software in their possession. Nor can an a priori power analysis necessarily rule in/out this threat. effect size. a smaller effect size than anticipated increases the chances of Type II error.POST HOC POWER 209 sive. we recommend in the strongest possible manner that all quantitative researchers conduct a priori power analyses whenever possible.. the sample size yielded by the a priori power analysis might be smaller than is needed to detect it. 1997. it is not possible to rule in or rule out low statistical power as a threat to internal validity (Onwuegbuzie. publication bias). finds it is rejected (i. Before the study is conducted. file-drawer problem) or when he or she submits the final report to a journal for review. On the other hand. Raju.. problems can still arise. This report also should include a rationale for criteria used for all input variables (i.. power.) In particular. Thus. 2001. this is a criticism of the power surveys highlighted previously. if the observed effect size is smaller than what is proposed. The observed effect size could end up being much smaller or much larger than the hypothesized effect size on which the power analysis is undertaken. It is likely that the lack of power analyses coupled with a publication bias promulgates the publishing of findings that are statistically significant but have small effect sizes (possibly increasing Type B error) as well as leading researchers to eliminate valuable hypotheses (Halpin & Easterday.e. sample size. the effect of power on a statistically nonsignificant finding can be assessed more appropriately by using the observed (true) effect to investi- . 1999). rather. 1999). without knowing the power of the statistical test. This is because a priori power analyses involve the use of a priori estimates of effect sizes and standard deviations (Harris. number of variables studied) needed to design a trustworthy study. researchers do not know what the observed effect size will be. Inclusion of such analyses will help researchers to make optimum choices on the components (e. the researcher then disregards the study (i. In many cases. All they can do is try to estimate it based on previous research and theory (Wilkinson & the Task Force on Statistical Inference.. the lack of information regarding components needed to calculate power (e. a priori power analyses do not represent the power to detect the observed effect of the ensuing study. significance level. Mulaik. they represent the power to detect hypothesized effects.e. 1973.g. effect size. APA. most researchers do not determine whether the statistically nonsignificant result is the result of insufficient statistical power. In other words. 1997). Fagley & McKinney. Sawyer & Ball. 1981. Conveniently. Conversely. 1971). Sherron. Fagley & McKinney. the majority of researchers advocate reporting post hoc power only for statistically nonsignificant results (Cohen. Fagely.g. several authors have recommended the use of post hoc power analyses for statistically nonsignificant findings (Cohen. 1969. both sets of analysts have agreed that estimating the power of significance tests that yield statistically nonsignificant findings plays an important role in their interpretations (e. 1983). 1983. In fact. . Fagley & McKinney. & Rogers.com/tech/stat/ Algorithms.spss.3 personal communication. statistically nonsignificant results in a study with low power suggest ambiguity. When post hoc power should be reported has been the subject of debate. 4Details of the post hoc power computations in the statistical algorithms for the GLM and MANOVA procedures can be obtained from the following Web site: http://www. Tversky & Kahneman. 1971). Sawyer & Ball. Fagely. statistically nonsignificant results can make a greater contribution to the research community than they presently do. 1985. the observed effect size used to compute the post hoc power estimate might be very different than the true (population) effect size. Specifically. post hoc power coefficients are available in SPSS for the general linear model (GLM). Wooley & Dawson. statistically nonsignificant results in a study with high power contribute to the body of knowledge because power can be ruled out as a threat to internal validity (e. 1965. 1983). 1996. Nichols. the post hoc power procedure for analyses of variance (ANOVAs) and multiple ANOVAs is contained within the “options” button. 1985. 1981. Sawyer & Ball. For example..” as it is called in the SPSS output) “is based on taking the observed effect size as the assumed population effect. 1985. To this end. 392). 2002). 1981.. Wooley & Dawson. “Just as rejecting the null does not guarantee large and meaningful effects. 1973. Tversky & Kahneman.210 ONWUEGBUZIE AND LEECH gate the performance of an NHST (Mulaik et al. Although some researchers advocate that post hoc power always be reported (e. Sawyer & Ball. November 4. Fagley & McKinney.. culminating in a misleading evaluation of power. 1983. Such a technique leads to what is often called a post hoc power analysis. However.4 It should be noted that due to sampling error. Schmidt. 1983.htm. post hoc power analyses can be conducted relatively easily because some of the major statistical software programs compute post hoc power estimates.. Interestingly. 1981). 1983. As noted by Fagely (1985). Post hoc power (or “observed power. accepting the null does not preclude interpretable results” (p. 1985. SPSS Technical Support.g. Dayton. 1997. which produces a positively biased but consistent estimate of the effect” (D. Fagely. 3David Nichols is a Principal Support Statistician and Manager of Statistical Support. Fagely. Schafer. 1988).g. we use Cohen’s (1988) framework for conducting post hoc power for multiple regression and correlation analysis. Equation 1 can be rewritten as F= PVi dfe × . whole or partial. PVe dfi (2) . the independent variables can represent ex post facto research or retrospective research and can be naturally occurring or the results of experimental manipulation.e. ANOVA. 1988). (1) dfe where PVi is the proportion of the variance in the dependent variable explained by the set of independent variables. PVe is the proportion of error variance. Also. however..POST HOC POWER 211 PROCEDURE FOR CONDUCTING POST HOC POWER ANALYSES Although post hoc power analysis procedures exist for multivariate tests of statistical significance such as multivariate analysis of variance and multivariate analysis of covariance (cf. The F test can be used for all univariate members of the GLM to test the null hypothesis that the proportion of variance in the dependent variable that is accounted for by one or more independent variables (PVi) is zero in the population. the advantage of this framework is that it can be used for any type of relationship between the independent variable(s) and dependent variable (e. for situations in which there is one quantitative dependent variable and one or more independent variables. linear or curvilinear. and main effects or interactions. Furthermore. 1968).. and analysis of covariance. this post hoc power analysis framework can be used for other members of the univariate GLM. the independent variables can be quantitative or qualitative (i. these methods are sufficiently complex to warrant use of statistical software. Furthermore. Cohen.g. that is. Most important. we restrict our attention to post hoc power analysis procedures for univariate tests of statistical significance.e.e. What follows is a description of how to undertake a post hoc power analysis by hand for univariate tests. t tests. for this article. nominal scale). the number of independent variables in the set). Here. degrees of freedom for the error variance). direct variates or covariates. including correlation analysis. because multiple regression subsumes all other univariate statistical techniques as special cases (Cohen. Thus. This F test can be expressed as PVi F= PVe dfi . dfi is the degrees of freedom for the numerator (i.. general or partial). As noted by Cohen (1988). and dfe is degrees of freedom for the denominator (i. these steps can also be used for a priori power analyses by substituting in a hypothesized effect size value instead of an observed effect size value.. Once the noncentrality parameter. is given by ES = PVi . According to Cohen (1988). 1998). dfi and dfe).212 ONWUEGBUZIE AND LEECH The left-hand term in Equation 2.05. nonzero effect size) with dependent variables. respectively. Step 1: Statistical Significance Criterion (α) Table 2 represents α = . which is an index that determines the extent to which null hypothesis is false. Specifically. is a measure of the effect size (ES) in the sample. dfi. provides information about the sample size and number of independent variables. namely.e. The right-hand term in Equation 2.01 and . 24). Thus.01. . degree of statistical significance is a function of the product of the effect size and the study size (Cohen. and Table 3 represents α = . The noncentrality parameter. is given by λ = ES(dfi + dfe + 1). p. these tables are entered for values of α. The shape of the noncentral F distribution as well as its range are a function of both the noncentrality parameter and the respective degrees of freedom (i. can be used to determine the post hoc power of the statistical test. Table 2 or Table 3. which represents the proportion of variance in the dependent variable that is accounted for by the set of independent variables.05 and outside these limits. Because most theoretically selected independent variables have a nonzero relationship (i. Because λ is a continuous variable. respectively. Thus. 1998.. λ. PVe (3) (4) and dfi and dfe are the numerator and denominator degrees of freedom. the effect size. linear interpolation typically is needed. the study size or sensitivity of the statistical test. the (post hoc) power of a statistical significance test can be defined as “the proportion of the noncentral distribution that exceeds the critical value used to define statistical significance” (Murphy & Myors. depending on the level of α. (Interpolation and extrapolation techniques can be used for α values between . and dfe.) Step 2: The Noncentrality Parameter (λ) In Tables 2 and 3. the distribution of F values in any particular investigation likely takes the form of a noncentral F distribution (Murphy & Myors.. has been calculated. where ES. power values are presented for 15 λ values.e. λ. 1988). and dfe 213 dfi 1 dfe 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 2 27 29 29 29 20 22 22 23 17 19 19 19 15 17 17 17 13 15 16 16 12 14 14 15 11 17 13 14 10 12 12 13 10 11 11 13 09 10 11 12 4 48 50 51 52 36 40 41 42 30 34 35 36 26 30 31 32 23 27 29 29 21 25 27 27 19 24 25 25 18 23 24 24 17 21 22 23 16 20 21 21 6 64 67 68 69 52 56 57 58 44 49 50 52 38 44 46 47 34 40 41 43 30 37 39 40 28 35 37 38 26 33 35 36 24 31 33 34 23 30 31 32 8 77 79 80 81 65 69 71 72 56 62 64 65 49 57 58 60 44 52 54 56 40 48 50 53 37 45 47 50 34 43 45 48 32 41 44 45 30 39 41 43 10 85 88 88 89 75 79 80 82 67 73 75 76 60 68 70 72 54 63 65 68 50 59 62 64 46 56 59 61 42 52 55 59 39 50 53 56 37 48 51 54 12 91 92 93 93 83 87 87 88 75 81 83 84 69 77 78 80 63 72 75 77 59 68 71 74 54 65 68 71 50 62 65 68 47 58 62 66 44 56 60 64 14 95 96 96 96 88 91 92 93 82 87 89 90 76 83 85 87 71 80 82 84 66 76 79 81 62 73 76 79 58 70 73 77 54 67 71 74 51 65 69 72 16 97 98 98 98 92 95 95 96 87 92 93 93 83 89 90 91 78 86 87 89 73 83 85 87 69 80 82 85 65 77 80 83 61 74 78 81 58 72 75 79 18 98 99 99 99 95 97 97 97 91 95 95 96 87 92 93 94 83 90 91 93 79 87 89 91 75 85 87 89 71 83 85 88 68 80 83 86 64 78 81 85 20 99 99 99 99 97 98 98 99 94 97 97 98 91 95 96 96 87 93 94 95 84 91 93 94 80 89 91 93 76 87 89 92 73 85 88 90 70 83 86 89 24 * * * * 99 * * * 97 98 99 99 95 98 98 99 93 97 98 98 91 96 97 97 88 94 96 97 85 93 95 96 82 92 94 95 79 90 93 94 28 32 36 40 2 * 3 4 5 6 7 8 9 10 99 * * * 98 99 99 * 96 99 99 99 95 98 99 99 93 97 98 99 91 97 98 99 88 96 97 98 86 95 96 98 * 99 * * 98 * * * 97 99 99 * 96 99 99 99 94 98 99 99 93 98 99 99 91 97 98 99 * 99 * 99 * * * 98 99 99 * * * 97 98 99 * * * 96 97 99 * * * 94 96 99 99 99 * * (continued) . dfi.POST HOC POWER TABLE 2 Power of the F Test As a Function of λ. 214 ONWUEGBUZIE AND LEECH TABLE 2 (Continued) dfi 11 dfe 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 2 09 11 11 12 08 09 10 11 09 10 10 11 09 10 10 11 08 09 10 10 08 09 09 10 08 08 09 09 07 08 08 09 07 08 08 08 07 07 07 08 07 07 07 08 4 15 18 19 21 14 18 20 20 13 16 18 19 13 16 17 19 12 15 16 18 11 14 15 17 11 13 14 16 10 12 13 15 09 11 12 13 08 10 10 12 08 09 10 11 6 21 27 29 31 20 28 30 30 19 24 26 29 18 23 25 28 17 22 24 27 15 20 22 25 14 19 21 24 13 17 19 21 11 15 16 19 10 12 14 17 09 11 13 15 8 27 36 39 42 27 36 40 40 24 33 36 40 23 31 34 38 22 30 33 37 19 27 30 34 18 25 28 32 16 22 25 29 14 19 22 26 11 16 18 22 10 14 16 21 10 34 45 49 52 33 44 48 50 30 41 45 50 29 40 43 48 27 38 42 47 24 34 38 43 22 31 36 41 19 28 32 37 16 24 28 33 13 19 23 29 12 17 20 26 12 41 54 58 62 39 52 56 60 37 50 54 59 35 48 52 58 33 46 51 56 29 41 46 52 26 38 43 50 23 34 39 46 19 29 34 41 15 23 28 35 14 20 24 32 14 48 62 67 70 45 59 63 69 43 58 62 67 41 56 61 65 39 54 59 64 34 48 54 60 31 45 51 58 27 40 46 54 22 34 40 48 18 27 33 42 15 24 29 38 16 55 70 74 78 52 67 72 76 49 65 70 75 47 63 68 73 44 61 66 72 39 55 61 68 36 52 58 65 31 46 53 61 25 40 46 56 20 32 38 49 17 28 34 45 18 61 76 80 83 58 73 78 82 55 71 76 81 52 69 74 79 50 67 73 78 44 62 68 76 40 58 65 72 35 52 60 68 29 45 53 62 22 36 44 55 19 31 39 51 20 67 81 85 89 64 79 83 87 61 77 81 85 58 75 80 84 55 73 78 83 49 68 74 80 45 64 71 78 39 58 66 74 32 51 59 69 25 41 49 61 21 35 44 57 24 76 89 91 94 74 87 90 93 71 86 89 92 68 84 88 91 65 73 78 83 58 78 83 88 54 75 81 87 47 69 76 83 39 61 69 79 30 50 60 72 25 44 54 68 28 84 94 96 97 81 93 95 97 79 92 94 96 76 90 93 96 74 89 92 95 67 85 90 93 63 83 88 92 55 78 84 90 45 70 78 87 35 59 69 81 30 52 63 74 32 89 97 98 99 87 96 97 98 85 95 97 98 83 94 97 98 81 94 96 97 74 91 94 97 70 89 93 96 62 84 90 94 53 78 85 92 41 67 77 87 35 59 71 84 36 40 12 13 14 15 18 20 24 30 40 48 93 96 98 99 99 * 99 * 91 94 98 99 99 99 99 * 90 93 97 99 98 99 99 * 88 92 97 98 98 99 99 * 86 90 96 98 98 99 99 99 80 85 94 97 97 98 98 99 77 82 93 96 96 98 98 99 69 75 89 93 94 96 97 98 59 65 84 89 90 94 95 97 47 52 74 80 83 88 92 95 39 44 67 73 78 83 89 93 (continued) . 995. TABLE 3 Power of the F Test As a Function of λ. Hillsdale. pp. 420–423).05. and dfe dfi 1 dfe 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 2 10 12 12 12 06 08 08 08 05 06 06 07 04 05 05 06 03 04 04 05 03 04 04 05 4 23 26 27 28 15 18 19 20 11 14 15 16 09 12 13 14 07 10 11 12 06 09 10 11 6 37 42 44 45 26 30 33 35 20 25 27 29 16 21 23 25 13 18 20 22 11 16 17 19 8 51 57 58 60 37 45 46 49 29 37 39 42 23 31 34 37 20 28 30 33 17 24 27 30 10 63 69 71 72 48 57 59 61 39 49 51 54 32 42 45 49 27 37 41 44 23 34 37 41 12 73 79 80 81 58 68 70 72 48 60 62 65 41 53 56 60 35 48 51 55 30 43 47 51 14 80 86 87 88 67 76 78 80 57 69 72 74 49 62 66 69 43 57 61 65 37 52 57 61 16 86 91 92 92 75 83 85 87 65 77 79 82 57 71 74 77 50 66 70 74 45 61 66 70 18 90 94 95 95 81 88 90 91 72 83 85 87 64 78 81 84 58 73 77 80 52 69 73 77 20 94 96 97 97 86 92 93 94 78 88 90 91 71 83 86 89 94 79 83 86 58 75 79 83 24 97 99 99 99 93 97 97 98 87 94 95 96 81 91 93 95 76 88 91 93 70 85 89 91 28 99 * * * 96 99 99 99 93 97 98 98 89 96 97 98 84 94 95 97 79 92 94 96 32 * 36 40 2 3 4 5 6 98 99 * * 96 99 99 * 93 98 98 99 90 97 98 99 86 96 97 98 99 * 98 * * 96 99 * * 93 * 98 99 99 91 * 98 99 99 (continued) . From Statistical Power Analysis for the Behavioral Sciences (2nd ed. α = . Inc. Reprinted with permission. Cohen. NJ: Lawrence Erlbaum Associates. dfi.POST HOC POWER TABLE 2 (Continued) 215 dfi 60 dfe 20 60 120 ∞ 20 60 120 ∞ 2 07 07 07 07 06 06 06 06 4 08 08 09 10 07 07 07 08 6 09 10 11 14 08 08 08 11 8 10 12 14 18 08 09 10 13 10 12 15 17 23 09 10 11 16 12 13 17 21 28 10 11 13 19 14 14 20 25 34 10 12 15 23 16 16 23 28 39 11 13 17 24 18 18 26 33 45 12 15 19 31 20 19 29 37 01 12 16 21 35 24 23 36 46 62 14 19 26 43 28 26 43 54 71 15 23 31 52 32 30 50 62 79 17 26 36 60 36 34 57 70 85 19 30 41 68 40 38 63 76 70 20 34 47 74 120 Note. by J.. *Power values here and to the right are > . 1988. 216 ONWUEGBUZIE AND LEECH TABLE 3 (Continued) dfi 7 dfe 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 2 03 04 04 04 02 03 03 04 02 03 03 04 02 02 02 03 02 03 03 03 02 02 02 03 02 02 02 03 02 02 02 03 02 02 02 02 02 02 02 02 02 02 02 02 4 05 08 09 10 05 07 08 09 04 06 07 08 04 06 07 08 04 05 06 07 03 05 06 07 04 05 05 06 04 05 05 06 03 04 05 06 03 04 04 05 03 04 04 05 6 09 14 16 18 08 13 14 16 07 11 13 15 06 10 12 14 07 10 11 13 05 09 11 12 06 08 10 11 05 08 09 11 05 07 09 10 04 06 07 09 04 06 07 08 8 15 22 24 27 13 20 22 25 11 18 20 23 10 16 19 22 10 15 17 20 08 14 17 19 08 13 15 18 08 12 14 17 07 11 13 16 06 10 11 14 05 09 10 13 10 20 30 34 37 18 27 31 35 16 25 29 33 14 23 27 31 13 21 25 29 12 20 24 27 11 18 22 26 10 17 21 25 09 16 19 24 08 14 17 21 07 12 15 19 12 26 39 43 48 23 36 40 45 21 33 37 42 19 30 35 40 17 29 33 38 15 26 31 36 14 25 29 35 13 23 28 33 12 22 26 32 10 18 22 28 09 16 20 26 14 33 48 53 58 29 44 49 55 26 41 46 52 24 38 44 49 22 36 41 47 20 33 38 45 18 32 37 44 17 30 35 42 15 28 33 40 13 23 29 36 11 21 26 33 16 40 57 62 67 36 53 58 64 32 49 55 61 29 46 52 58 26 44 50 56 24 41 47 54 22 39 45 52 20 36 43 50 19 34 41 49 15 29 36 44 14 26 32 42 18 46 64 69 74 42 61 66 72 38 57 63 69 34 54 60 66 31 51 58 64 29 48 55 62 26 46 53 60 24 43 50 59 23 41 48 57 18 35 43 52 16 32 39 49 20 53 71 76 81 48 67 73 78 44 64 70 76 40 61 67 74 37 58 65 71 34 55 62 69 31 52 60 68 29 50 58 66 27 47 55 64 22 41 49 59 19 37 46 56 24 65 82 86 90 60 79 84 88 55 76 81 86 51 73 79 84 47 71 77 83 44 68 75 81 40 65 72 80 38 62 70 78 35 60 68 77 29 53 62 72 25 49 59 69 28 74 90 93 95 70 87 91 94 65 85 89 93 61 83 87 91 57 80 86 90 53 78 84 89 50 76 82 88 47 73 80 87 44 71 78 86 36 64 73 82 32 60 70 80 32 82 94 96 98 78 93 95 97 74 91 94 96 70 89 93 96 66 87 92 95 62 85 90 94 59 84 89 93 55 82 88 92 52 80 86 92 44 74 82 89 39 70 79 87 36 40 8 9 10 11 12 13 14 15 18 20 88 99 97 * 96 99 84 98 96 * 98 99 81 86 95 97 97 98 98 99 77 83 94 96 96 98 98 99 74 80 92 95 95 97 97 99 70 77 91 94 95 97 97 99 67 74 89 93 93 96 96 98 63 70 88 92 92 96 96 98 60 67 86 91 90 95 95 98 51 59 81 87 88 93 94 96 46 53 78 84 86 91 92 96 (continued) . power = α.. NJ: Lawrence Erlbaum Associates.POST HOC POWER TABLE 3 (Continued) 217 dfi 24 dfe 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 20 60 120 ∞ 2 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 01 01 01 01 01 01 01 01 4 03 03 03 04 02 03 03 04 02 02 03 03 02 02 02 03 02 02 02 02 02 02 02 02 6 04 05 06 07 03 04 05 06 03 03 04 05 02 03 03 05 02 03 03 04 02 02 02 03 8 05 07 09 12 04 06 07 10 03 05 06 08 03 04 04 07 03 03 04 06 02 02 02 04 10 06 10 13 17 05 08 10 14 04 06 08 11 03 05 06 10 03 04 05 08 02 02 03 05 12 07 14 17 23 06 11 14 19 05 08 10 15 04 06 08 13 03 05 07 11 02 03 04 06 14 09 17 22 29 07 14 18 25 05 10 13 20 04 08 11 17 04 06 08 14 02 03 04 08 16 11 22 28 37 08 17 22 31 06 12 17 25 05 10 14 21 04 08 11 18 03 04 05 10 18 13 26 34 44 10 21 27 37 07 15 20 31 06 12 17 26 05 09 13 22 03 04 06 12 20 15 31 40 51 12 25 33 44 08 18 24 37 07 14 20 32 05 11 15 27 03 05 07 15 24 20 42 52 64 15 34 44 57 11 24 33 49 08 19 28 43 07 14 21 37 03 06 09 20 28 26 52 63 75 19 43 55 69 13 32 41 60 11 25 36 54 08 19 28 47 04 08 12 27 32 32 62 73 84 24 52 65 79 16 39 52 71 13 32 45 65 10 24 35 58 04 09 15 35 36 38 71 81 90 29 61 73 86 20 47 61 79 15 39 53 74 11 29 43 67 05 11 18 43 40 44 78 87 94 34 69 80 91 23 55 70 86 18 45 62 81 13 35 51 75 06 13 22 51 30 40 48 60 120 Note. Inc. power coefficients for the following four values of dfe are presented: 20. Step 4: Denominator Degrees of Freedom (dfe) For each value of dfi. by J. linear interpolation can be used for other values. for all values of dfi. *Power values here and to the right are > . Reprinted with permission.01. it should be noted that λ = 0. Cohen.995. 60. Interpolation between dfe values should be linear with re- . 1988. 420–423). for interpolation. Again. Hillsdale. Step 3: Numerator Degrees of Freedom (dfi) Both Table 2 and Table 3 present entries for 23 values of dfi. when λ < 2. 120. and ∞. From Statistical Power Analysis for the Behavioral Sciences (2nd ed. pp. α = . Case A. where dfe falls between Ldfe and Udfe. when the power coefficient is needed for a given λ for a given dfe.0833 (Cohen.218 ONWUEGBUZIE AND LEECH spect to the reciprocals of dfe. Equation 3 can then be used. the following formula is used: Power = PowerL = ( ( 1 1 Ldfe + 1 dfe + 1 Ldfe Udfe ) ) ( Poweru – PowerL ). and dfb is the number of variables in Set B. over and above what is explained by another set B. there are other variables (Set C) that form part of the error variation. B . 1988). This is represented by RY ×A. 1/20. Virtually all univariate tests of statistical significance fall into one of the following three cases: Case A. inasmuch as the test of interest is the unique contribution of Set A in accounting for the variance in the dependent variable. is ascer2 2 tained. such that for dfe > 120. Then. the proportion of variance in the dependent variable accounted for a Set A. As before. 1/60. With respect to Case B. the simplest case. However. The error variance proportion is then 2 1 – RY ×A. B . and Equation 3 is used. the power values PowerL and PowerU are obtained for λ at Ldfe and Udfe by linear interpolation. B – RY ×B . respectively. Case B. the denominator of Equation 5 is 1/120 – 0 = . (8) where dfi is the number of variables contained in Set A. the lower and upper values presented in Table 2 and Table 3. to compute power for a given dfe. or Case C. 1/Udfe = 0. 1/120. dfi is the number of independent variables contained in the set. Case C is similar to Case B. (7) The error degrees of freedom for Case B is given by dfe = N – dfi – dfb – 1. (5) For Udfe = ∞. 2 1 – RY ×A (6) where the numerator represents the proportion of variance explained by the variables in Set A and the denominator represents the error variance proportion. B – RY ×B 2 1 – RY ×A. In particular. with the effect size being represented by 2 2 RY ×A. and 0. with the effect size being represented by 2 RY ×A . yielding . involves testing a set (A) of one or more independent variables. that is. if statistical significance is not reached. with Cases A and B being special cases of Case C.. then the researcher should conduct a post hoc power analysis in an attempt to rule in or to rule out inadequate power (e. Chandler. and Moore (2003) conducted a study investigating characteristics associated with teachers’ views on discipline. Onwuegbuzie. Thompson. at the 5% level). and dfc is the number of variables in Set C.POST HOC POWER 219 2 RY ×A. Conversely. For an example of how to compute upper bounds for post hoc power estimates. HEURISTIC EXAMPLE Recently. if statistical significance is reached (e. 2001. Collins. (10) where dfi is the number of variables contained in Set A. with the effect size being represented by 2 2 RY ×A. Therefore. Specifically. FRAMEWORK FOR CONDUCTING POST HOC POWER ANALYSES We agree that post hoc power analyses should accompany statistically nonsignificant findings. For each hypothesis. 1997). Filer.C . 1980.g.C (9) The error degrees of freedom for Case B is given by dfe = N – dfi – dfb – dfc – 1. we recommend that upper bounds for post hoc power estimates be computed (Steiger & Fouladi. the reader is referred to Steiger and Fouladi (1997).. Equation 3 can then be used. then the researcher should report the effect size and confidence interval around the effect size (e.g.80) as a threat to the internal validity of the finding. Cumming & Finch. Steiger & Fouladi. the next step is to test the hypotheses. As noted by Cohen (1988). Once data have been collected. Bird.g. B – RY ×B 2 1 – RY ×A. In particular. Figure 1 displays our power-based framework for conducting NHST. 2002. 1957. such analyses can provide useful information for replication studies. . However. 1997. 2002). this is beyond the scope of this article.. once the research purpose and hypotheses have been determined.5 In fact. 1992. B. power < . Fleishman. This upper bound is estimated via the noncentrality parameter. dfb is the number of variables in Set B. 5Moreover. B. Case C represents the most general case. the next step is to use an a priori power analysis to design the study. Witcher. the components of the post hoc power analysis can be used to conduct a priori power analyses in subsequent replication investigations. Cohen’s [1988] . although not presented here. we restrict our attention to one of them.e. can be found by examining the original study. ethnicity (i. Witcher.220 ONWUEGBUZIE AND LEECH I d e n t if y re s e a rc h p u rp o s e D e t e rm in e a n d s tate hy pothes es C o lle c t data C o n d u c t a p rio ri p o w e r a n a ly s is a s p a rt o f p la n n in g re s e a rc h d e s ig n C o n d u c t N u ll H y p o t h e s is S ig n if ic a n c e Te s t S t a t is t ic a l s ig n if ic a n c e a c h ie v e d ? No R e p o rt a n d in t e rp re t p o s t -h o c p o w e r c o e f f ic ie n t Ye s P o s t -h o c power c o e f f ic ie n t lo w ? Ye s No R e p o rt e f f e c t s iz e a n d c o n f id e n c e in t e rv a l a ro u n d e f f e c t s iz e Lac k of power is a riv a l e x p la n a t io n o f t h e s t a t is t ic a lly n o n -s ig n if ic a n t f in d in g R u le o u t p o w e r a s a riv a l e x p la n a t io n o f t h e s t a t is t ic a l nons ig n if ic a n c e FIGURE 1 Power-based framework for conducting null hypothesis significant tests. minority) and its relationship to discipline styles.82) for detecting a moderate difference in means (i..e. The sample size was selected via an a priori power analysis because it provided acceptable statistical power (i. Participants were 201 students at a large mid-southern university who were either preservice (77.e.. Caucasian American vs..0%) teachers. . namely.0%) or in-service (23.. The theoretical framework for this investigation. et al. Although several independent variables were examined by Onwuegbuzie. 05. and . A high score on any of these scales represents a teacher’s proclivity toward the particular discipline approach.. and interactionalists.. non-interventionist. using the actual observed effect sizes pertaining to these three differences.12. and . Tamashiro and Carl D. maintaining a family-wise error of 5% (i. The post hoc power analysis for this test of ethnic differences revealed low statistical power. Interventionist. However. 12. Witcher.e.72 (95% confidence interval [CI] = . Onwuegbuzie. . revealed no statistically significant difference between Caucasian American (n = 175) and minority participants (n = 26) for scores on the Interventionist.. (2003) concluded the following: The finding of no ethnic differences in discipline beliefs also is not congruent with Witcher et al.01 for each set of statistical tests comprising the three subscales used. the Noninterventionist. .e.05.e.59) for detecting a moderate effect size for comparing the two groups with respect to three outcomes (Erdfelder et al. On the 1st week of class.5) at the (two-tailed) . the non-significance could have stemmed from the relatively small proportion of minority students (i. participants were administered the Beliefs on Discipline Inventory (BODI).77).03 for interventionist. et al. 0. and Interactionalist subscales generated scores that had a classical theory alpha reliability coefficient of .95). This measure was constructed to assess teachers’ beliefs on classroom discipline by indicating the degree to which they are noninterventionists. the post-hoc statistical power estimates were . Erdfelder et al. using the Bonferroni adjustment to maintain a family-wise error of 5%. respectively. 1996). On the other hand. Interventionist.52. The BODI contains three subscales representing the Noninterventionist. . (2001).66. . Onwuegbizie. which induced relatively low statistical power (i. 1996). each with two response options. the in-service teachers represented graduate students who were enrolled in one of two sections of a research methodology course. . Glickman (as cited in Wolfgang & Glickman.88. A series of independent t tests. Onwuegbuzie. p > . For our study..05 level of significance. For each item.94 (95% CI = . The preservice teachers were selected from several sections of an introductory-level undergraduate education class. The BODI contains 12 multiple-choice items. which was developed by Roy T. More specifically. decided to conduct a post hoc power analysis.47. and Interactionalist orientations.93. respectively—which all represent extremely low statistical power for detecting the small observed effect sizes.75 (95% CI = . (2003) could have concluded that there were no ethnic differences in discipline beliefs. Again. t(199) = 0. and applying the Bonferroni adjustment.POST HOC POWER 221 d = . p > . and Interactionalist. participants are asked to select the statement with which they most agree. p > . and interactionalist orientations. approximately . t(199) = 0.06. . Witcher.80).69. 1986). et al. with scores on each subscale ranging from 0 to 8. et al. t(199) = –1.05 subscales. who reported that minority preservice teachers less often endorsed classroom and behavior management skills as characteristic of effective teachers than did Caucasian-American preservice teachers.. interventionists.9%). Noninterventionist. After finding statistical nonsignificance. Witcher. Thus. to determine the power of the test of racial differences in levels of interventionist orientation. From Table 3. effect size) can be obtained by using one of the following three transformation formulae: ES = t2 . respectively (Murphy & Myors. these estimates could have been calculated by hand using Equation 3 and Tables 2 and 3. We could use Equation 5.47)2 + 199 (14) Next. .03 for noninterventionist and interactionalist orientations.. Table 3 would be used (i.222 ONWUEGBUZIE AND LEECH Replications are thus needed to determine the reliability of the present findings of no ethnic differences in discipline belief.e.07%. we substitute this value of ES and the numerator and denominator degrees of freedom into Equation 3 to yield a noncentrality parameter of λ = ... d2 is the square of Cohen’s d statistic. dfi ( F ) + dfe d2 . (2003) computed their power coefficients using a statistical power program (i.e.15 ≈ 2. pertaining to the racial difference in interventionist orientation. total sample size – 2) from Equation 11.e.. we use dfi = 1 and λ = 2. Witcher. The same procedure can be used to verify the post hoc power values of . (–1.0107...e. Using the t value of –1. because if both the lower and upper power values are .12.05/3). For example. 1996). The proportion of variance in interventionist orientation accounted for by race (i. Witcher.e. dfe = 199 is between Ldfe = 120 and Udfe = ∞. respectively.06 and . our estimated power value will be . and dfi and dfe are the numerator and denominator degrees of freedom.12. et al. 19) Although Onwuegbuzie.47)2 = . Now.e. + dfe ) (t 2 (11) ES = dfi ( F ) . (p. .47 reported previously by Onwuegbuzie.0107 (1 + 199 + 1) = 2. +4 (12) ES = d2 (13) where t2 is square of the t value. however. the associated numerator degrees of freedom (dfi) of 1 (i. we obtain ES = (–11. F is square of the F value. which represents extremely low statistical power for detecting the small observed effect size of 1. α = . Erdfelder et al. 1998).. one independent variable: race) and the corresponding denominator degrees of freedom (dfe) of 199 (i. et al.01) as a result of the Bonferroni adjustment used (i. ’s (2003) post hoc analysis for their subsequent a priori power analyses. Indeed. The post hoc analysis documented previously. Onwuegbuzie. this would have been yet another instance of the deepening file-drawer problem (Rosenthal..POST HOC POWER 223 If Onwuegbuzie. . Even more information about future sample sizes can be gleaned by constructing confidence intervals . researchers interested in performing independent replications could use the information from Onwuegbuzie. Witcher. Moreover. two or more independent studies yielding similar findings that produce statistically and practically compatible outcomes). the best way to increase conclusion validity is through independent (external) replications of results (i. By conducting a post hoc analysis. Similarly. we believe that what-if post hoc power analyses place the novice researcher in a better position to select appropriate sample sizes—ones that reduce the probability of committing both Type II and Type A errors. 1979). The post hoc power analysis allowed the statistically nonsignificant finding pertaining to ethnicity to be placed in a more appropriate context. Thus. As noted by Onwuegbuzie and Levin (2003) and Onwuegbuzie. (2003) realized that the conclusion that no ethnic differences prevailed in discipline beliefs likely would have resulted in a Type II error being committed. Levin. could have conducted what we call a what-if post hoc power analysis to determine how many more cases would have been needed to increase the power coefficient to a more acceptable level (e. et al. et al. independent replications make “invaluable contributions to the cumulative knowledge in a given domain” (Robinson & Levin. as part of their post hoc power analysis. This in turn helped the researchers to understand that their statistically nonsignificant finding was not necessarily due to an absence of a relationship between ethnicity and discipline style in the population but was due at least in part to a lack of statistical power. Indeed.’s (2003) investigation. novice researchers can use information from what-if post hoc power analyses to set their minimum group/sample sizes. post hoc analyses can play a very important role in promoting as well as improving the quality of external replications. Witcher. strengthens the rationale for conducting independent replications. et al. 1997. et al. 25). et al. Witcher. conclusion validity would have been threatened.e. Moreover... et al. Witcher. et al. Either way. by revealing low statistical power first and foremost.g. n > 201) in independent replication studies than that used in Onwuegbuzie. Although we strongly encourage these researchers to perform a priori power analyses. (2003) had stopped after finding statistically nonsignificant results.80). Witcher. As such. if Onwuegbuzie. Onwuegbuzie. Witcher. they would have increased the probability of committing a Type A error.e. what-if post hoc power analyses provide useful information for future researchers. Even novice researchers who are unable or unwilling to conduct a priori power analyses can use the post hoc power values to warn them to employ larger samples (i. (2003). p. had bypassed conducting a statistical significance test and had merely computed and interpreted the associated effect size. By using this technique when conducting meta-analyses. sensitivity might be increased by using measures that yield more reliable and valid scores or a study design that allows them to control for undesirable sources of variability in their data (Murphy & Myors. then a power discrepancy index (PDI) could be computed. . In particular. if a researcher conducts both an a priori and a post hoc power analysis within the same study. it is clear that this approach represents good detective work because it enables researchers to get to know their data better and consequently puts 6According to Hess and Kromrey (2002). We have outlined several ways in which post hoc power analyses can be used to improve the quality of present and future studies. The PDI would then provide information to researchers for increasing the sensitivity (i. 1998.60 would adjust the observed effect-size coefficient by this amount. what we call weighted meta-analyses could be conducted wherein effect sizes across studies are weighted by their respective post hoc power estimates.e. 5) of future a priori power analyses. a post hoc power estimate of . the effect-size estimate of interest would be multiplied by this value to yield a power-adjusted. 2002). Additionally. Sensitivity might be improved by increasing the sample size in independent replication studies. Bird. for the two-sample t-test context. meta-analysts could determine the average observed power across studies for any hypothesis of interest.224 ONWUEGBUZIE AND LEECH around what-if post hoc power estimates using the (95%) upper and lower confidence limits around the observed effect size (cf. p. Hedges and Olkins’s (1985) z-based confidence interval procedure compares favorably with all other methods of constructing confidence bands including Steiger and Fouladi’s (1992. For example. if an analysis revealed a post hoc power estimate of . we are convinced that other uses of post hoc power analyses are awaiting to be uncovered. “the degree to which sampling error introduce imprecision into the results of a study”.80. Knowledge of observed power across studies in turn would help researchers interpret the extent to which studies in a particular area were truly contributing to the knowledge base. Alternatively. For instance. this list is by no means exhaustive. which represents the difference between the a priori and post hoc power coefficients.. if researchers routinely were to conduct post hoc power analyses. effect-size coefficient. Regardless of the way in which a post hoc power analysis is utilized. Indeed. Low average observed power across independent replications would necessitate research design modifications in future external replications. 1998). In addition. However. effect sizes resulting from studies with adequate power would be given more weight than effect sizes derived from investigations with low power. Post hoc power analyses also could be used for meta-analytic purposes. Similarly. 1997) interval inversion approach. Murphy & Myors. Hedges and Olkin’s (1985) z-based 95% CI6 could be used to determine the upper and lower limits that subsequently would be used to derive a confidence interval around the estimated minimum sample size needed to obtain adequate power in an independent replication. we advocate the use of post hoc power analyses in empirical studies. reasons for the nonuse of a priori power analyses were presented. Thus. we believe that post hoc power analyses should play a central role in NHST. in this article. Nevertheless. if power is lacking. then they assess the effect size. and if and only if statistical significance is found... virtually no meta-analyst has formally used this technique. we believe that such approaches should never be used as a substitute for a priori power analyses. we believe that a post hoc analysis also should be performed. post hoc power was defined and its utility delineated.. a heuristic example was provided to illustrate how post hoc power can help to rule in/out rival explanations to observed findings. then meta-analysts would be able to assess whether independent replications are making significant contributions to the accumulation of knowledge in a given domain. when a statistically nonsignificant finding emerges. then the first step of the two-step method and the first and third steps of the three-step procedure.50 (Cohen. statistical significance). even when an a priori power analysis has been conducted. Specifically. This would help researchers determine whether low power threatens the internal validity of their findings (i. First.e.e. Fourth. 1962). 1992). Therefore. Clearly. Onwuegbuzie. Onwuegbuzie and Levin (2003) proposed a three-step procedure when two or more hypothesis tests are conducted within the same study. we recommend that a priori power analyses always be conducted and reported. Although both methods are appealing. this incidence can be reduced if researchers conduct an a priori power analysis to select appropriate sample sizes. the incidence of Type A error likely is high. Third. which involves testing the trend of the set of hypotheses at the third step. SUMMARY AND CONCLUSIONS Robinson and Levin (1997) proposed a two-step procedure for analyzing empirical data whereby researchers first evaluate the probability of an observed effect (i. However. several methods were outlined that describe how post hoc power analyses can be used to improve the design of independent replications. Recently. which serve as gatekeepers for computing effect sizes. researchers should then conduct a post hoc power analysis. a step-by-step guide is provided for conducting post hoc power analyses. if researchers conduct post hoc power analyses whether statistical significance is reached. especially when statistically nonsignificant findings prevail. 2001).e. Although we advocate the use of post hoc power analyses. Because the typical level of power for medium effect sizes in the behavioral and social sciences is around . Moreover. their effectiveness depends on the statistical power of the underlying hypothesis tests. especially if one or . may lead to the nonreporting of a nontrivial effect (i. Second. Finally. Moreover. such analyses are rarely employed (Cohen. Type A error).POST HOC POWER 225 them in a position to interpret their data in a more meaningful way and with more conclusion validity. Regardless. Type A error. Unfortunately. Educational and Psychological Measurement. M. Harvard Educational Review. 429–430. (1976). The case against statistical significance testing. M. Journalism Quarterly. 378–399. Cahan. Statistical significance is not a “kosher certificate” for observed effects: A critical analysis of the two-step approach to the evaluation of empirical results. Indeed. 432–437. (1978). & Baran. Carver. 29(1). post hoc power provides a nice balance in report writing because we believe that post hoc power coefficients are to statistically nonsignificant findings as effect sizes are to statistically significant findings. L. R... & Owen. . Bakan. K. P. Educational Researcher. Carver. DC: Author. K. 2002). 6. REFERENCES American Psychological Association. A note on the power of statistical tests in the Journal of Educational Measurement. K. This can only help to increase the accumulation of knowledge across studies because meta-analysts will have much more information with which to use. (2001). Chase. The statistical concepts of confidence and significance. Journal of Educational Measurement. S. Bird. J. J. we believe that such a policy of conducting and reporting a priori and post hoc power analyses would simultaneously reduce the incidence of Type II and Type A errors and subsequently reduce the incidence of publication bias and the file-drawer problem.226 ONWUEGBUZIE AND LEECH more statistically nonsignificant findings emerge. J. 54. Educational and Psychological Measurement. 31–34. it is no more ambitious than the editorial policies at 20 journals that now formally stipulate that effect sizes be reported for all statistically significant findings (Capraro & Capraro. 10. 71–74. (2002). 53. 62. 48. The case against statistical significance testing. Post hoc power analyses rely more on available data and less on speculation than do a priori power analyses that are based on hypothesized effect sizes. (1993). In fact. 308–311. W. D. P. 62. (1972).). we agree with Wooley and Dawson (1983). Journal of Experimental Education. Chandler. R. E. Brewer. 391–401. Washington. who suggested “editorial policies to require all such information relating to a priori design considerations and post hoc interpretation to be incorporated as a standard component of any research report submitted for publication” (p. (1973). J. 9. revisited. American Education Research Journal.. 680). (1957). 287–292. On the power of statistical tests in the American Education Research Journal. Publication manual of the American Psychological Association (5th ed. 61. Confidence intervals for effect sizes in analysis of variance. R. (2000). Capraro. Although it could be argued that this recommendation is ambitious. The test of significance in psychological research. 197–226. (1966). Psychological Bulletin. S. (2002). P. In any case. Brewer. Psychological Bulletin. An assessment of quantitative research in mass communication. D. 771–782. Treatments of effect sizes and statistical significance tests in textbooks. & Capraro. R. M. This surely would represent a step in the right direction. 21–35). 532–575. S. B. L. Applied statistical power analysis and the interpretation of nonsignificant results by research consumers. Cohen.). Behavior Research Methods. The statistical power of abnormal social psychological research: A review. 1. Wolman (Ed. C..05). American Psychologist. Statistical methods for research workers (84th ed. A primer on the understanding. 1–20. Cohen. & J. 29–41. 10. K. New York: Academic. B.. H. N. 124–132. S.05). Personality and Social Psychology Bulletin. A. Fisher. Chase. New York: Academic. (Original work published 1925) Fleishman. Cohen. 10. Instruments. Statistical power analysis for the behavioral sciences (2nd ed. 62.). (1975). development. R. Cohen. 8. Greenwald. Multiple regression as a general data-analytic system. & Hexamer. Applied Stochastic Models and Data Analysis. Educational and Psychological Measurement. (1968). J. R. J. (1976). 749–770. 168–173. (1988). 997–1003. Statistical power analysis for the behavioral sciences. (1977). Power analysis of research in counselor education. & Finch. (1994). & Findley. Cohen. S.. Journal of Counseling Psychology. 112. A. 391–396. & Chase. The earth is round (p < . 28. Haase. Inc. The earth is round (p < . A. Dayton. 61. & Computers. Cooper. Consequences of prejudice against the null hypothesis. 32. 145–153. Journal of Abnormal and Social Psychology. (1983). G. What if there were no significance tests? (pp. (1985). 231–234. (1983). and data-analytic implications. J. R. (1992). 3–10. Research in the Teaching of English. 30. Journal of Applied Psychology.. Educational and Psychological Measurement. (1941). (1976). Cohen. 157–164. B. (1974). (1982). Educational and Psychological Measurement. Speech Monographs. W. 426–443. J. Psychological Bulletin. L. 95–121). New York: McGraw-Hill. On appropriate uses and interpretations of power analysis: A comment. 82. Mulaik. 225–230. Guttman.). Reviewer bias for statistically significant results: A reexamination. (2002). J. Psychological Bulletin. 155–159. G. J. Confidence intervals for correlation ratios. L. GPOWER: A general power analysis program.. H. F. J. The illogic of statistical inference for cumulative science. The Psychological Record. Cohen. 49. Scotland: Oliver & Boyd.. Cumming. Handbook of clinical psychology (pp.POST HOC POWER 227 Chase. 14. Journal of Counseling Psychology. 473–486. A. 70. Expected effect sizes: Estimates for statistical power analysis in social psychology. L. (1975). NJ: Lawrence Erlbaum Associates.. Statistical power: Derivation. J. G. Cohen. 1–11. F.. Psychological Bulletin. B. American Educational Research Journal. 40. D. Statistical power analysis and research results. (1962). J.. Cohen. & McKinney. (1965). Schafer. In B. Cohen. Statistical power analysis for the behavioral sciences (Rev.) Edinburgh. Fagley. M. S. A power-analytical examination of contemporary communication research. M. The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. & Rogers. N. A. E. ed. . (1996). 17. Fagely. J. (1980). J.). Counselor Education and Supervision. (2001). Fidler. J. Hillsdale.. (1973). & Tucker. (1985). A power primer. I. A. use and calculation of confidence intervals that are based on central and noncentral distributions. 26. Inc. & Tucker. Daly. J. K. R. (1997). Mahwah. A statistical power analysis of applied psychological research. 234–237. (1969). L. Faul. 42(1). Steiger (Eds. I. M. J. 298–300. Harlow. Chase. 61. Statistical power in research in English education. American Educational Research Journal. 659–670. Some statistical issues in psychological research. (1973). F. & Buchner.. NJ: Lawrence Erlbaum Associates. R. 65. Erdfelder. J. In L. & J. (1933b). G. 289–337. Inc. 103–115. H. J. R.. R.. J. Thomas. Raju. Louisiana Educational Research Journal. T. (1982). Review of Educational Research. 263–294. . Testing statistical hypotheses in relation to probabilities a priori. J. H. Morse. LA. & Moore. Paper presented at the annual meeting of the Mid-South Educational Research Association. & Robinson. M. November). (1998). Hess. NJ: Lawrence Erlbaum Associates. (1928). Meehl. Series A. 350–386. 231. Philosophy of Science. S. & Pearson. S.. Mundfrom. V.. YAPP: Yet another power program. MANOVA. (2001. B. J. Mahwah. (1980). April). (2000). Young. 251–265. Interval estimates of effect size. Olejnik. I.. and practitioners in gerontology. 494–498. Onwuegbuzie. 34. Neyman. 492–510. Gerontologist. (1978). (1999). (2001. A. & Easterday. & Sodt.. November). 5. Cribbie. Huberty. Psychology will be a much better science when we change the way we analyze data. The data analysis dilemma: Ban or abandon. Statistical methods for meta-analysis. 29. & Harshman.. Reforming significance testing via three-valued logic. P. Mahwah. Paper presented at the annual meeting of the American Educational Research Association. A. S. J. Steiger (Eds.. Donahue. S. & J. 29(1).. L. E. A. and the slow progress of soft psychology. G. R. S. T. R. (1998). Waechter. Journal of Counseling Psychology. Loftus. A review of null hypothesis significance testing. On the problem of the most efficient tests of statistical hypotheses. Mulaik. S. April). Proceedings of the Cambridge Philosophical Society. 46. 58–65. (1996). What if there were no significance tests? (pp. & Pearson.. Sir Ronald. 161–171.. Theory testing in psychology and physics: A methodological paradox. Nix.. 23. M. Communication disorders: A power analytical assessment of recent research. The Journal of Communication. (1967). (1933a). M. & Solomon. Katzer. Paper presented at the annual meeting of the Mid-South Educational Research Association. A.. 15.. L. 145–174). R. McNemar. J. Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. K. Mahwah.). Research in the Schools.228 ONWUEGBUZIE AND LEECH Haase. A. D. Q. Hedges. 806–834. Educational Researcher. (1998). AR. Kroll. S. & Barnette. Inc. Mulaik. There is a time and a place for significance testing. S. 175–200... In L. (2002. An analysis of the use of statistical testing in communication research. and ANCOVA analyses. A. J. E. 20A.. J. New York: Academic. 20. Meehl. J. Shaw. An empirical comparison of methods for constructing confidence bands around standardized mean differences. What if there were no significance tests? (pp. & Olkin. 237–247. NJ: Lawrence Erlbaum Associates. Current Directions in Psychology.. 3–14. Halpin.. and the conclusion coherence of primary research studies. Levin. 65–115). H. D. TN. & Myors. A. D. J. Statistical power analysis: Implications for researchers. D. Paper presented at the annual meeting of the American Educational Research Association. R. At random: Sense and nonsense.. W. L. American Psychologist. Little Rock. A. L. 5–28. Statistical hypothesis testing. D. H. New Orleans.. A new proposed binomial test of result direction. L. Transactions of the Royal Society of London. J. R. 295–300. Lix. 29. Harlow. 5. Introductory graduate research courses: An examination of the knowledge base. Levenson. (1997). 68. N. R. D. E. 24. effect-size estimation.. 34–36.. Journal of Consulting and Clinical Psychology. & Chase. Chattanooga. NJ: Lawrence Erlbaum Associates. Harlow. B. (1973). S. On the use and interpretation of certain test criteria for the purposes of statistical inference. et al. D. J. (1985). Neyman. Neyman. J. S. E. K.. Keselman. Journal of Communication Disorders. G. Statistical practices of educational researchers: An analysis of their ANOVA. R. (1975). E. Inc. & Pearson. J.. F.). Murphy. 8. CA. E. M. R. Harris. J. San Diego. L. (1997). How significant is a significant difference? Average effect size of research in counseling psychology. Theoretical risks and tabular asterisks: Sir Karl. planners. Steiger (Eds. P. In L. C. Biometrika. & Kromrey. Mulaik. (1998. R. The importance of statistical power for quantitative research: A post-hoc review. (1960). 115–129. 71. (1992). (2003). J. F. Harlow. 60. 89–103. 275–290.. Onwuegbuzie. H. M. J. 133–151. Typology of analytical and interpretational errors in quantitative and qualitative educational research. Onwuegbuzie.. Common analytical and interpretational errors in educational research: An analysis of the 1998 volume of the British Journal of Educational Psychology. S. Penick. Eight common but false objections to the discontinuation of significance testing in the analysis of research data. J. G. where would reported measures of substantive importance lead? To no good effect. Steiger (Eds. L.edu/volume6/number2/ Onwuegbuzie. 638–641. Journal of Modern Applied Statistical Methods. E. D... L. R. J. K. E. & Levin. Onwuegbuzie. J. Orme. The statistical power of a decade of social work education research. & Daniel.. R. & Hunter. J. What do data really mean? American Psychologist. Reflections on statistical and substantive significance. Witcher. 1. 377–381.. E. (2003). W. Factors associated with teachers’ beliefs about discipline in the context of practice. 21–26. A. A. Sawyer. J. D. J. J.POST HOC POWER 229 Onwuegbuzie. (2004). (1997). with a slice of replication. Statistical power and effect size in marketing research. 18. J. 1173–1181. Onwuegbuzie. What if there were no significance tests? (pp. Measurement and Evaluation in Counseling and Development. 47. T. Robinson. J. 10(2). NJ: Lawrence Erlbaum Associates. E. 11–22. Classical statistical hypotheses within the context of Bayesian theory.. A case study in the failure of psychology as a cumulative science: The spontaneous recovery of verbal learning. 619–632. J. Journal of Marketing Research. H. 285–292. & Brewer. In L. Retrieved October 10. L. (2003). 26.. Educational Research Quarterly. 37–40. A. & Daniel.. Rossi. F. Educational Researcher. J. L. 2. & Daniel.. 175–197). In L. S. (2003). & J. A. Onwuegbuzie.. Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. G. The “file-drawer problem” and tolerance for null results. (2003). & Levin. L. The fallacy of the null hypothesis significance test. J. (1997). J. L. Rossi. & Levin. Psychological Bulletin. Schmidt. Research in the Schools. G. and subgroup differences. J. 57. L. 11(1). H. 646–656. G. Cary. Filer. M. & Moore. (1986).. Learning Disabilities: A Contemporary Journal. Schmidt. SAS Institute. Manuscript in preparation. Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology. 71–90. SAS/STAT User’s Guide (Version 8. J. R. (1996). A. confidence intervals. Overall. (1969). (1997).. A. (2002). G. A. Onwuegbuzie. J. February 12). Journal of Research in Science Teaching.ed. 61–72. Research in the Schools. (2002). 416–428. Rosenthal. Steiger . 6(2) [Online]. R. Mulaik. Schmidt. W. J. & Leech. NC: Author.2) [Computer software]. J. J. A. A. A. Reliability generalization: The importance of considering sample specificity. L.). Current Issues in Education. 86. Levin. S. 58. K. (1960). & Ball. Mahwah. 1.. F. Social Science Review. A. The power of statistical tests in science teaching research. A.asu. Harlow. A. A. A new proposed three-step method for testing result direction. Psychological Methods. Inc. & Tolman. J. 26(5). (1981). Expanding the framework of internal and external validity in quantitative research. Psychological Bulletin. 2004. S. Collins. 9.. Without supporting statistical evidence. 10. Onwuegbuzie. & J. (2003. Mulaik. R. Inc. (1990). (1972). (2002). from http://cie. Research in the Schools. 35–44. L.. Do effect-size measures measure up? A brief assessment. Rozeboom. A framework for reporting and interpreting internal consistency reliability estimates. (1979). Psychological Bulletin. R. 35. N. .. & J. Harlow. Belief in the law of small numbers. (1992).. & Kahneman. (1971). E. 581–582. J. 25–32. T. H. & Glickman. Witcher. E.). R2: A computer program for interval estimation. J. E. & Gigerenzer. & The Task Force on Statistical Inference. H. Hunter. Inc.. A. G.. & Urry. (1989). 221–257). and hypothesis testing for the squared multiple correlation. power calculation. G. A. (2002). V. 105–110.. Instruments. T. (2001). Witcher. A. (1999). 594–604.. Solving discipline problems: Strategies for classroom teachers (2nd ed. & Dawson. American Psychologist.. Steiger. 45–57.. 20. Inc. Statistical power in criterion-related validation studies. What future quantitative social science research could look like: Confidence intervals for effect sizes. E. J. (1988). 54. Journal of Research in Science Teaching. & Minor. 8. Onwuegbuzie.).. B. S. 37–64). Research in the Schools. 4. NJ: Lawrence Erlbaum Associates. and Computers. What if there were no significance tests? (pp. NJ: Lawrence Erlbaum Associates. 169–175. J. (2001). F. H. Thompson. 61. L. Chicago: Author. C. A. Characteristics of effective teachers: Perceptions of preservice teachers. 309–316. What if there were no significance tests? (pp.0 for Windows [Computer software]. A follow-up power analysis of the statistical tests used in the Journal of Research in Science Teaching. Educational Researcher. 673–681. SPSS. & Fouladi. Sherron. (2001). H. Psychological Bulletin. Mahwah. Noncentrality interval estimation and the evaluation of statistical models. Community/Junior College Quarterly. & Minor. Wooley. In L. D. 31(3).. A. H.. L.. Wolfgang. & Fouladi. Research in the Schools. R.). Steiger (Eds. R. Characteristics of effective teachers: Perceptions of preservice teachers. Mahwah. J. Wilkinson. . Onwuegbuzie. L. A. W. 12. Journal of Applied Psychology. (1997). 45–57. (1983). C. Schmidt. Steiger. Statistical methods in psychology journals: Guidelines and explanations. R. Behavior Research Methods. D. Inc. 76. Mulaik. 8(2). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin. (1986). Power analysis: The other half of the coin. Tversky. Boston: Allyn & Bacon. Sedlmeier.230 ONWUEGBUZIE AND LEECH (Eds. L. (1976). L. SPSS 11. P. 105. D. O. T. 473–485.
Copyright © 2024 DOKUMEN.SITE Inc.