% pagename="atatistics" %>
(After some further study, ShuQin found that the phenomenon that she encountered is called “Dissonosance”. Further citations on Post Hoc Tests summarized by Shuqin Guo)
The following is reference from Roger E. Kirk's (1982) Experimental Design (pp115-125).
Fisher's LSD (least Significant Difference)
When subsequent tests are performed, the conceptual unit for the error rate is the individual comparison, which means it doesn't control the error rate at a for the collection of tests. The use of LSD test can lead to an anomalous situation in which the overall F statistic is significant, but none of the pairwise comparison is significant. This situation can arise because the overall F test is equivalent to a simultaneous test of the hypothesis that all possible contrasts among means are equal to zero. The contrast that is significant, however, may involve some linear combination of means such as m1 - (m2 + m3)/2 rather than m1 - m2.
Tukey's HSD Test (Honestly Significant Difference)
Tukey's test requires that the n's in each treatment level must be equal. The critical difference, y-hat(HSD), that a pairwise comparison must exceed to be declared significant is, according to Tukey's procedure,
y-hat(HSD) = qa;p,v srt(MSerror /n)
A test of the overall null hypothesis that m1 =m2 …= mp is provided by a comparison of the largest pairwise difference between means with the critical difference y-hat(HSD), which can be obtained from a table. This test procedure, which utilizes a range statistic, is an alternative to the overall F test. For most sets of data, the range and F tests lead to the same decision concerning the overall null hypothesis. However, the F test is generally more powerful.
The Scheffe's S procedure (1953) is one of the most flexible, conservative, and robust data snooping procedures available. If the overall F statistics is significant, Scheffe's procedure can be used to evaluate all a posteriori contrast among means, not just the pairwise comparisons. In addition, it can be used with unequal n's. The error rate experiment wise is equal to a for the infinite number of possible contrast among p>=3 means. Since an experimenter always evaluate a subset of the possible contrasts, Scheffe's procedure tends to be conservative. It is much less powerful than Tukey's HSD procedure for evaluating pairwise comparisons, for example, and consequently, is recommended only when complex contrasts are of interest. Scheffe's procedure uses the F sampling distribution and, like ANOVA, is robust with respect to nonnormality and heterogeneity of variance.
A different approach to evaluating a posteriori pairwise comparisons stems from the work of Student (1927), Newman (1939), and Keuls (1952). The Newman-Keuls procedure is based on a stepwise or layer approach to significance testing. Sample means are ordered from the smallest to the largest. The largest difference, which involves means that are r = p steps apart, is tested first at a level of significance; if significant, means that are r = p - 1 steps apart are tested at a level of significant and so on. The Newman-Keuls procedure provides an r-mean significance level equal to a for each group of r ordered means; that is, the probability of falsely rejecting the hypothesis that all means in an ordered group are equal to a. It follows that the concept of error rate applies neither on an experimentwise nor on a per comparison basis--the actual error rate falls somewhere between the two. The Newman-Keuls procedure, like Tukey's procedure, requires equal sample n's.
The critical difference y-hat(Wr), that two means separated by r steps must exceed to be declared significant is, according to the Newman-Keuls procedure,
y-hat(Wr) = qa;p,v srt(MSerror /n)
It should be noted that the Newman-Keuls and Tukey procedures require the same critical difference for the frost comparison that is tested. The Tukey procedure uses this critical difference for all of the remaining tests while the Newman-Keuls procedure reduces the size of the critical difference, depending on the number of steps separating the ordered means. As a result, Newman-Keuls test is more powerful than Tukey's test. Remember, however, that Newman-Keuls procedure does not control the experimentwise error rate at a.
Frequently a test of the overall null hypothesis m1 =m2 …= mp is performed with an F statistic in ANOVA rather than with a range statistic. If the F statistic is significant, Shaffer (1979) recommends using the critical difference y-hat(Wr -1) instead of y-hat(Wr) to evaluate the largest pairwise comparison at the first step of the testing procedure. The testing procedure for all subsequent steps is unchanged. She has shown that the modified procedure leads to greater power at the first step without affecting control of the type I error rate. This makes dissonances, in which the overall null hypothesis is rejected by an F test without rejecting any one of the proper subsets of comparison, less likely.