Normality and Homoscedasticity

Many statistical tests make assumptions about the distribution of the data. Two of the most common are normality — that the outcome variable is approximately normally distributed within each group — and homoscedasticity — that the variance is roughly equal across groups.

Normality and homoscedasticity checks run automatically for all group comparison chart types: group comparison, bar, box, violin, dot, and strip plot. For scatter and regression charts, normality is checked on the residuals of the fitted model rather than on raw groups.

These checks are always-on — they run every time a figure is generated, without requiring the researcher to request them. This ensures that test selection decisions are always grounded in verified assumptions.

These checks are useful inputs to test selection, but they are not a complete validation of the analysis design. They tell you about distribution shape and variance structure, not whether the rows are the right observation units or whether the study design itself supports the requested test.

Normality checks

Licklider tests normality using the Shapiro-Wilk test by default. For group comparisons, the test is applied to each group separately. For scatter and regression charts with an explicit linear regression run, normality is assessed on residuals (see below).

Shapiro-Wilk is the default because it has strong power for detecting many common departures from normality in small to moderate samples and is widely used in applied analysis workflows [1].

Two additional normality tests are also available on request:

Kolmogorov-Smirnov test — a distribution-free test that compares the sample to a reference distribution
D'Agostino-Pearson test — a test based on skewness and kurtosis of the sample

The result for each group is either normal or non-normal, based on the p-value from the test. When all groups pass the normality test, the overall normality assessment is positive. When any group fails, the overall assessment is negative.

Homoscedasticity check

Equal variance across groups is tested using Levene's test (median-centered). This test is less sensitive to departures from normality than Bartlett's test and is appropriate for most life sciences data.

Licklider uses the median-centered form because it is more robust than the classic mean-centered version when the data are skewed or heavy-tailed [2, 3].

For group-comparison figures, Levene's test runs as part of the same always-on assumption pass as Shapiro-Wilk. The result indicates whether the variance can be treated as equal across groups and supports disclosure and interpretation for parametric paths.

Licklider computes Levene's test on the TypeScript side using median-centered deviations. When ANOVA is also run through the Python engine, the engine returns its own Levene result as a reference value. The TypeScript result is authoritative for test selection; the engine value is recorded for audit purposes.

Regression residual normality (scatter and regression)

For scatter and regression charts with an explicit linear regression run, Licklider checks normality on the residuals of the ordinary least squares fit rather than on raw data groups. The Shapiro-Wilk test is applied to the residuals, and the result determines whether Pearson or Spearman correlation is selected as the primary measure.

For multiple-predictor models, the residual normality check is not yet supported. In this case, both Pearson and Spearman are reported without a primary designation.

How these checks affect test selection

Normality and homoscedasticity results are the primary inputs to Licklider's automatic test selection logic for group comparisons. For scatter and regression charts, residual normality drives the primary correlation choice.

For two-group comparisons:

Normality	Design	Result
All groups normal	Independent	Welch's t-test
All groups normal	Paired	Paired t-test
Any group non-normal	Independent	Mann-Whitney U
Any group non-normal	Paired	Wilcoxon signed-rank

For three or more groups:

Normality	Design	Result
All groups normal	Independent	One-way ANOVA
All groups normal	Paired/Repeated	Repeated measures ANOVA
All groups normal	Mixed (between x within)	Mixed ANOVA
Any group non-normal	Independent	Kruskal-Wallis
Any group non-normal	Paired/Repeated	Friedman
Any group non-normal	Mixed (between x within)	Mixed ANOVA

For scatter and regression charts:

Residual normality	Result
Normal	Pearson correlation (primary)
Non-normal	Spearman correlation (primary)

Homoscedasticity affects variance assumptions within parametric tests but does not change the test family for multi-group selection. Welch's t-test is used by default for independent comparisons because it is valid whether or not variances are equal.

This automation reduces one common source of error, but it does not settle every design question. Licklider cannot determine from these checks alone whether non-normality is driven by outliers, whether rows that look separate are actually repeated measurements, or whether a paired design was specified correctly. Those issues can change which test is appropriate even when the normality and variance results are correctly reported.

For more detail on test selection logic → see Choose the Right Test.

Where to review the results

Normality and homoscedasticity results are visible in the Stats panel of the figure Inspector. The panel shows the test used, the result for each group (or for residuals in regression), and the p-value.

The overall test selection — which test was chosen and on what basis — is also shown in the Stats panel alongside the results.

In practice, you should expect to see three kinds of output together: the named check that ran (Shapiro-Wilk, Levene), the per-group or residual result with p-values, and the downstream consequence for test selection or warning state.

When automatic selection changes the primary analysis path, Licklider also records a short explanation of why that switch happened. For example, if Mann-Whitney U is chosen instead of Welch's t-test, the figure view shows the normality results in the Stats surface, and the Inspector's Dataset and Assurance sections preserve the rationale for the selected method.

The consistency of these automatic switches is also reviewed at the project level. The Project Audit checks whether figures with comparable assumption outcomes received consistent primary test selection. See Project Statistical Policy and Consistency Audit.

Interpreting a non-normal result

A non-normal result from a Shapiro-Wilk test means the data in that group does not fit a normal distribution well. This has implications for which tests are valid but does not automatically mean the data is problematic.

Some points to keep in mind:

Shapiro-Wilk is sensitive to sample size. With large samples, small deviations from normality that have no practical impact on test validity will produce significant results.
Parametric tests (t-test, ANOVA) are reasonably robust to mild departures from normality, especially when group sizes are equal and reasonably large.
The non-parametric alternatives that Licklider selects when normality fails are valid without the normality assumption, but they test a different hypothesis (rank-based comparisons rather than comparisons of means).
A statistically significant normality test does not automatically mean the non-parametric path is always the best scientific choice. Sample size, group balance, outliers, and the actual analysis question still matter.
A non-normal result can reflect outliers, mixture distributions, ceiling effects, or other data issues that this page does not diagnose by itself.

If the automatic test selection does not match your analysis plan, you can override it in the Chat. The override is recorded in the analysis record.

Design rationale and references

Licklider uses Shapiro-Wilk as the default normality check because it is a strong general-purpose test for small and moderate sample sizes and is widely accepted for practical assumption screening [1]. Alternative tests are available on request because different contexts may prioritize different sensitivities, but the default should work well for most routine group comparison workflows.

Licklider uses the median-centered Levene test rather than Bartlett's test because variance checks often have to operate under imperfect normality. The Brown-Forsythe variant is less distorted by non-normal data and is therefore a safer default for applied biological data than a more fragile equal-variance test [2, 3].

Welch's t-test remains the default independent two-group parametric path because it controls Type I error well when variances differ and loses little power when variances happen to be equal [4]. That is why homoscedasticity informs interpretation and warning state here, but does not force a separate preliminary "variance equality first" decision rule for independent two-group testing.

Methodological foundations

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3-4), 591-611. -> Original source of the Shapiro-Wilk test and the basis for its use as Licklider's default normality check.
Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to Probability and Statistics (pp. 278-292). Stanford University Press. -> Foundational reference for Levene-style variance testing.
Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367. -> Establishes the median-centered form as a more robust equal-variance check under non-normality.
Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1), 92-101. -> Direct support for using Welch's t-test as the default independent two-group parametric path.

Current support boundary

This page explains normality and equal-variance checks only; it does not by itself verify independence, pairing structure, or pseudoreplication.
Licklider does not automatically determine whether a significant normality result is caused by a few extreme outliers, hidden subgroups, or a genuinely incompatible data-generating process.
Licklider does not automatically know whether a non-parametric fallback is scientifically preferable when parametric methods would still be robust enough for the sample size and design.
The checks described here inform test selection, but they are only one part of the broader assumption guard and should be read alongside independence and outlier checks.

What this page does not cover

How to change the selected test → see Choose the Right Test
How the variance assumption is handled within ANOVA → see One-Way ANOVA and Post Hoc
How robustness to outliers is evaluated → see Outlier Sensitivity Report
How observation units and independence are defined → see Observation Unit Declaration