You are right in that separate pairwise comparisons are likely to generate type I errors. However, we can choose a lower alpha level or reduce the number of groups to make the analysis possible. Again, depending on your theoretical perspectives for the study, you may wish to use one of the groups as the control, and then compare the combination of other groups with the control group. Meihua Zhai: I checked some other sources and came up with some different "suspicions". The two books I checked are: Keppel, Geoffrey (1982) Design Y Analysis A Researcher's Handbook (2nd Ed) and Kirk, Roger, E. (1982) Experimental Design: Procedures for the Behavioral Sciences (2nd ed.) Based on what I read, I think your problem might be due to the stringent nature of the scheffe test since this test is considered robust against different sample size.

(After some further study, ShuQin found that the phenomenon that she encountered is called “Dissonosance”. Further citations on Post Hoc Tests summarized by Shuqin Guo)

The following is reference from Roger E. Kirk's
(1982) *Experimental Design*
(pp115-125)*.*

__Fisher's
LSD (least Significant Difference)__

When subsequent tests are performed, the conceptual
unit for the error rate is the individual comparison, which means it
doesn't control the error rate at a
for the collection of tests. The use of LSD test can lead to an anomalous
situation in which the overall F statistic is significant, but none of the
pairwise comparison is significant. This situation can arise because the
overall F test is equivalent to a simultaneous test of the hypothesis that
all possible contrasts among means are equal to zero. The contrast that is
significant, however, may involve some linear combination of means such as
m_{1
}- (m_{2
}+ m_{3})/2
rather than m_{1
}- m_{2}.

__Tukey's HSD
Test (Honestly Significant Difference)__

Tukey's test requires that the *n*'s in each treatment level must be equal. The critical difference, y-hat(HSD),
that a pairwise comparison must exceed to be declared
significant is, according to Tukey's procedure,

y-hat(HSD)
= q* _{a;p,v
}*srt(MS

A test of the overall null hypothesis
that m_{1
}=m_{2
}…= m_{p
}is provided by a comparison of the largest pairwise
difference between means with the critical difference
y-hat(HSD),
which can be obtained from a table. This test procedure,
which utilizes a range statistic, is an alternative to
the overall *F *test. For most sets of data, the
range and *F* tests lead to the same decision concerning
the overall null hypothesis. However, the *F *test
is generally more powerful.

__Scheffe's
Test__

The Scheffe's *S* procedure (1953)
is one of the most flexible, conservative, and robust
data snooping procedures available. If
the overall F statistics is significant, Scheffe's procedure
can be used to evaluate all a posteriori contrast among
means, not just the pairwise comparisons. In addition,
it can be used with unequal n's. The error rate experiment
wise is equal to a
for the infinite number of possible contrast among p>=3
means. Since an experimenter always evaluate a subset
of the possible contrasts, Scheffe's procedure tends to
be conservative. It is much less powerful than Tukey's
HSD procedure for evaluating pairwise comparisons, for
example, and consequently, is recommended only when complex
contrasts are of interest. Scheffe's procedure uses
the *F* sampling distribution and, like ANOVA, is
robust with respect to nonnormality and heterogeneity
of variance.

__Newman-Keuls
Test__

A different
approach to evaluating a posteriori pairwise comparisons stems from the
work of Student (1927), Newman (1939), and Keuls (1952). The Newman-Keuls
procedure is based on a stepwise or layer approach to significance
testing. Sample means are ordered from the smallest to the largest. The
largest difference, which involves means that are r = p steps apart, is
tested first at a
level of significance; if significant, means that are r = p - 1 steps
apart are tested at a
level of significant and so on. The Newman-Keuls procedure provides an
r-mean significance level equal to a
for each group of r ordered means; that is, the probability of falsely
rejecting the hypothesis that all means in an ordered group are equal to a.
It follows that the concept of error rate applies neither on an
experimentwise nor on a per comparison basis--the actual error rate falls
somewhere between the two. The Newman-Keuls procedure, like Tukey's
procedure, requires equal sample *n*'s*.
*

The critical
difference y-hat(W_{r}),
that two means separated by r steps must exceed to be
declared significant is, according to the Newman-Keuls
procedure,

y-hat(W_{r})
= q* _{a;p,v
}*srt(MS

It should be noted that the Newman-Keuls and Tukey procedures require the same critical difference for the frost comparison that is tested. The Tukey procedure uses this critical difference for all of the remaining tests while the Newman-Keuls procedure reduces the size of the critical difference, depending on the number of steps separating the ordered means. As a result, Newman-Keuls test is more powerful than Tukey's test. Remember, however, that Newman-Keuls procedure does not control the experimentwise error rate at a.

Frequently
a test of the overall null hypothesis m_{1
}=m_{2
}…= m_{p
} is performed with an F statistic in ANOVA
rather than with a range statistic. If the F statistic
is significant, Shaffer (1979) recommends using the critical
difference y-hat(W_{r
}-1) instead of y-hat(W_{r})
to evaluate the largest pairwise comparison at the first
step of the testing procedure. The testing procedure for
all subsequent steps is unchanged. She has shown that
the modified procedure leads to greater power at the first
step without affecting control of the type I error rate.
This makes *dissonances*, in which the overall null
hypothesis is rejected by an *F *test without rejecting
any one of the proper subsets of comparison, less likely.