Group Comparison

Use this section when your question is whether 2 or more groups differ on an outcome and you need to choose the group comparison method that matches your design.

This section is for choosing and interpreting group-difference tests, not for prediction or association modeling, time-to-event analysis, or free-form figure design. If your main question is about relationships between variables, time-to-event outcomes, or which figure family to build, start with Regression and Modeling, Survival Analysis, or Figures and Visualization instead.

Comparing group means is one of the most common tasks in quantitative research, and one of the most frequently misreported. Licklider guides you from test selection through assumption checking, execution, and figure-ready output, applying current methodological standards at every step.

Which test should I use?

Use the table below to jump to the leaf page that matches your design. If you are unsure whether your data meet the normality or variance assumptions, run the test anyway - Licklider checks both automatically and flags violations before reporting results.

GroupsDesignDistributionMethod
2IndependentNormalt-test (Welch)
2PairedNormalPaired t-test
2IndependentNon-normal or ordinalMann-Whitney U
2PairedNon-normal or ordinalWilcoxon signed-rank
3+Independent, 1 factorNormalOne-Way ANOVA + post hoc
3+Repeated measures, 1 factorNormalRepeated measures ANOVA
3+Mixed, 1 between x 1 withinNormal or assumption-reviewedMixed ANOVA
3+IndependentNon-normal or ordinalKruskal-Wallis + post hoc
3+Paired/repeatedNon-normal or ordinalFriedman test
3+Independent, 2 factorsNormalTwo-Way ANOVA + post hoc

Before you run a test - three things to check

These three assumptions apply across all parametric group comparison methods. Licklider checks each one automatically, but understanding what they mean helps you interpret warnings when they appear.

Normality

Parametric tests assume your data are approximately normally distributed within each group. Licklider runs Shapiro-Wilk on each group and flags deviations in the output panel. For small samples (n < 10 per group), normality tests have low power - a non-significant result does not confirm normality. When sample sizes are very small, consider the non-parametric alternative regardless of the Shapiro-Wilk result.

Variance homogeneity

Most ANOVA variants assume equal variances across groups (homoscedasticity). Licklider applies Levene's test and, where appropriate, applies Welch's correction automatically. You do not need to run a variance test before analysis - Licklider handles this by default.

Independence

Each observation must come from a different subject, unless you are using a paired or repeated-measures design. Mixing independent and dependent observations without accounting for the structure inflates Type I error. If your data include repeated measurements from the same subject, review the paired / repeated-measures guidance path and related checks before interpreting results.

Methods

Choose the leaf that matches your design: t-Test, One-Way ANOVA and Post Hoc, Two-Way ANOVA and Post Hoc, Repeated Measures ANOVA, or Non-Parametric Alternatives. Each method page covers when to use the test, what assumptions Licklider checks, how to interpret the output, and a worked example.

If your design combines one between-subjects factor with one within-subjects factor, see Mixed ANOVA.

t-Test

t-Test

When to use: Comparing means between exactly two groups - independent samples, paired measurements, or designs with unequal variances (Welch).

What Licklider provides: Automatic Welch correction, Cohen's d with confidence interval, exact p-value, and a publication-ready strip or box plot with significance overlay.

One-Way ANOVA

One-Way ANOVA and Post Hoc

When to use: Three or more independent groups with one categorical factor.

What Licklider provides: Classic or Welch omnibus output, choice of Tukey HSD / Bonferroni / Scheffe / Dunnett / Games-Howell post hoc methods, eta^2 and omega^2 where applicable, and pairwise significance brackets on figures.

Two-Way ANOVA

Two-Way ANOVA and Post Hoc

When to use: Two categorical factors, with or without interaction. Common in dose x treatment and genotype x condition designs.

What Licklider provides: Main effects and interaction F-tests, post hoc comparisons, interaction plot, and partial eta^2.

Repeated Measures ANOVA

Repeated Measures ANOVA

When to use: Three or more within-subject conditions on the same units, with normality satisfied and a balanced complete design.

What Licklider provides: One-way RM-ANOVA with Mauchly sphericity testing, Greenhouse-Geisser or Huynh-Feldt correction when sphericity is violated, partial eta-squared, and validation that every subject appears in every condition exactly once.

Non-Parametric Alternatives

Non-Parametric Alternatives

When to use: Small samples, non-normal distributions, or ordinal data — as a methodological substitute for each parametric test above.

What Licklider provides: Mann-Whitney U, Wilcoxon signed-rank, Kruskal-Wallis, and Friedman tests are available in the current group comparison runtime, reporting U/W/H statistics, rank-biserial r, and epsilon^2 or Kendall's W as applicable. See that page for the design-to-test map and current support boundary.

Reporting your results

A complete group comparison report includes all of the following:

  • The test statistic and degrees of freedom - for example, t(28) = 3.42, F(2, 45) = 7.11
  • An exact p-value, not a boundary statement such as p < 0.05
  • An effect size with confidence interval - Cohen's d, eta^2, or epsilon^2 depending on the test
  • Sample size per group

Licklider's export bundle includes a methods-and-results text snippet pre-formatted to meet these requirements. Threshold labels such as "significant" and "non-significant" are intentionally omitted from Licklider output - exact values give readers the information needed to draw their own conclusions.

Design Rationale & References

Licklider's design choices

Licklider defaults to Welch's t-test for all two-group parametric comparisons without requiring a prior variance test [1]. For one-way ANOVA, Tukey HSD is the default post hoc method; Bonferroni is available as a more conservative alternative and is surfaced with a warning when replicate counts are low, reflecting recent evidence that Tukey may over-permit false positives in low-replicate life science experiments [3]. Effect sizes are reported alongside all test results: Cohen's d for parametric tests, rank-biserial r for Mann-Whitney and Wilcoxon, epsilon^2 for Kruskal-Wallis, and Kendall's W for Friedman [4, 5]. p-values alone do not convey the magnitude or practical importance of a difference [6]. See Non-Parametric Alternatives for the full rank-based test map. Exact p-values replace threshold-based significance labels throughout the interface, consistent with ASA guidance [7, 8].

Methodological foundations

  1. Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1), 92–101.
  2. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.
  3. Sullivan, G. M., & Feinn, R. (2012). Using effect size — or why the p-value is not enough. Journal of Graduate Medical Education, 4(3), 279–282.

Known limitations

  1. Nakagawa, S. (2004). A farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral Ecology, 15(6), 1044–1045.
  2. Zweifach, A. (2025). Bonferroni's method, not Tukey's, should be used to control the total number of false positives when making multiple pairwise comparisons in experiments with few replicates. SLAS Discovery, 35, 100253.
  3. Fagerland, M. W. (2012). t-tests, non-parametric tests, and large studies - a paradox of statistical practice?. BMC Medical Research Methodology, 12, 78.

Paradigm shifts worth knowing

  1. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
  2. Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Moving to a world beyond "p < 0.05". The American Statistician, 73(sup1), 1–19.