t-Test

The t-test compares means between one or two groups. Licklider runs Welch's correction by default for independent two-group designs, checks normality and variance homogeneity automatically, reports Cohen's d alongside every result, and explains the design limits it cannot detect from the data alone.

When to use a t-test

Use a t-test when you have a continuous outcome variable and you want to compare:

  • A single group's mean against a known or theoretical value (one-sample)
  • Two independent groups — subjects in different conditions with no pairing (independent two-sample)
  • The same subjects measured under two conditions, or matched pairs (paired two-sample)

If you have three or more groups, use One-Way ANOVA instead. If your data are ordinal or clearly non-normal with small samples, see Non-Parametric Alternatives for rank-based alternatives (Mann-Whitney U, Wilcoxon signed-rank) that are available in the current product.

Variants

Welch's t-test (independent, unequal variances) — default

Licklider's default for independent two-group comparisons. Welch's correction adjusts the degrees of freedom using the Welch-Satterthwaite equation, making the test valid regardless of whether the two groups have equal variances. Because the cost in power when variances happen to be equal is negligible, there is no reason to test for variance equality first — Welch's t-test is appropriate in all independent two-group designs [1].

Student's t-test (independent, equal variances)

The classic formulation, which pools variance across groups. Valid only when group variances are genuinely equal. Licklider makes this available as an alternative, but Welch's t-test is recommended as the default for all independent designs [1]. If you select Student's t-test and Levene's test flags unequal variances, Licklider displays a warning in the assumptions panel.

Paired t-test

Use when the same subject appears in both conditions, or when subjects are deliberately matched one-to-one (for example, before/after measurements, littermate controls). The test operates on the within-pair differences, giving it more power than an independent design when pairing is effective.

<div class="docs-callout"> <p> <strong>Paired design:</strong> specify a <strong>subject / block ID column</strong> (<code>pair_column</code>) so each subject appears exactly once per group. Rows without a matching pair in the other group are excluded. Pairing by input order alone is not used when a pair ID column is provided. </p> </div>

One-sample t-test

Compares a single group's mean against a fixed reference value (μ<sub>0</sub>). Common uses include comparing a measurement against a published norm, a regulatory threshold, or a pre-specified target. Set μ<sub>0</sub> in the analysis options; the default is 0.

Assumptions

Licklider checks the following assumptions automatically for all t-test variants. Normality and variance checks run in a pipeline separate from the t-test computation; they surface in the Assumptions panel (not inside the t-test result object).

These checks reduce common mistakes, but they do not validate your study design for you. In particular, Licklider cannot infer the correct observation unit, cannot tell whether a one-tailed hypothesis was truly pre-specified, and cannot determine whether a pairing variable is scientifically valid unless that structure is explicit in your data contract.

Normality

Licklider runs Shapiro-Wilk on each group and flags groups where p ≤ 0.05 as potentially non-normal. Shapiro-Wilk is skipped for groups with fewer than 3 or more than 5,000 observations.

Important: with small samples (n < 10 per group), Shapiro-Wilk has low power and will often return a non-significant result even when the distribution departs from normality. A non-significant result in a small sample does not confirm normality. For n < 10, review Non-Parametric Alternatives for rank-based alternatives that are available for non-normal data.

Variance homogeneity (independent designs only)

Licklider runs Levene's test (median-based) and flags groups where p < 0.05 as having unequal variances. Because Licklider defaults to Welch's correction, unequal variance does not invalidate the result — the flag is informational. If you have switched to Student's t-test manually and this flag appears, switch back to Welch's.

Independence

Each row must represent a different subject, unless you are using the paired design. Repeated measurements from the same subject in an independent design inflate Type I error and cannot be corrected post hoc.

Important: Licklider does not automatically detect pseudoreplication or hidden non-independence when rows look separate in the table. If multiple rows come from the same animal, plate, well, litter, cage, or technical replicate set, a simple independent t-test can underestimate uncertainty and make p-values look smaller than they should.

Use a paired t-test only when the same subject or a true matched pair appears once in each group and that pairing was part of the design. If your data are clustered, repeatedly measured, or nested, declare the observation unit explicitly during setup and review the Observation Unit Declaration and Paired vs Unpaired Guard guidance before relying on this result.

Data scope

The analysis requires at least two valid rows in total after cleaning, and each group must have at least two valid observations; otherwise Licklider skips the t-test. At most 20,000 rows are sampled for the figure pipeline.

Reading the output

Licklider's t-test panel reports:

FieldWhat it means
tTest statistic. Sign indicates direction of the difference.
dfDegrees of freedom. For Welch's t-test this is a non-integer (Welch-Satterthwaite approximation).
p-valueExact two-tailed probability (or one-tailed if specified).
Mean A / Mean BGroup means (or sample mean for one-sample).
Mean differenceA − B (or mean − μ<sub>0</sub>). Positive values mean group A is higher.
95% CIConfidence interval on the mean difference, not on the effect size.
Cohen's dStandardised effect size. See interpretation guide below.
nSample size per group (or pair count for paired).

Effect size interpretation (Cohen's d)

These are conventional benchmarks, not rigid thresholds. Effect sizes should always be interpreted in the context of the field and the measurement scale [2].

dConventional label
0.2Small
0.5Medium
0.8Large

One- vs two-tailed tests

The default is two-tailed (testing whether the means differ in either direction). Choose a one-tailed test only if you pre-specified the direction of the effect before collecting data. Switching to one-tailed after seeing the data to achieve p < 0.05 inflates Type I error.

Licklider cannot determine from the dataset whether a one-tailed direction was genuinely specified in advance. That decision must come from your protocol, not from the observed result.

What Licklider does not decide for you

Even with automatic assumption checks, some statistical mistakes remain a design question rather than a software-detectable error:

  • Observation unit: Licklider cannot infer whether each row is a biological replicate, a technical replicate, or a repeated measurement of the same underlying unit.
  • Pairing validity: Licklider can use a pair_column, but it cannot know whether that pairing reflects the real scientific design or an after-the-fact convenience match.
  • Tail direction: Licklider can run a one-tailed test when requested, but it cannot verify that the directional hypothesis was specified before looking at the data.

If any of these are uncertain, pause before interpreting the t-test and review the Group Comparison overview, Non-Parametric Alternatives, and your study setup choices. The goal of the automatic checks is to surface common warning signs, not to certify that every design is safe for a t-test.

Example

Scenario

A researcher measures cell viability (%) in a control group (n = 12) and a treatment group (n = 14). Groups are independent; the researcher has no prior reason to assume equal variances.

Result (hypothetical)

Welch's t-test: t(21.4) = −2.83, p = .010

Mean control: 87.3% | Mean treatment: 79.6%

Mean difference: −7.7% (95% CI [−13.4, −2.0])

Cohen's d = 0.74 (medium-large)

Interpretation

The treatment group showed lower mean viability than the control group. The difference of 7.7 percentage points was statistically detectable (p = .010) and of medium-large magnitude (d = 0.74). The 95% confidence interval [−13.4, −2.0] excludes zero, consistent with a real difference in the population. Whether a 7.7-point difference in viability is biologically meaningful requires domain judgment beyond the statistical result.

Design Rationale & References

Licklider's design choices

Licklider defaults to Welch's t-test for all independent two-group designs, without requiring a prior variance test [1]. This follows the methodological position that testing for variance equality before choosing a t-test variant inflates Type I error and is unnecessary given Welch's negligible power cost [1]. Cohen's d is reported alongside every result because p-values alone do not convey the magnitude of a difference, and effect sizes are essential for interpreting practical or clinical relevance [2, 3]. The confidence interval displayed is on the mean difference (not on Cohen's d), which is what the t-test directly estimates.

Methodological foundations

  1. Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1), 92–101.

    → Demonstrates that Welch's t-test controls Type I error under unequal variances while losing negligible power when variances are equal — the empirical basis for Licklider's unconditional default.

  2. Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, 863.

    → The most widely cited primer on Cohen's d; the basis for Licklider's effect size calculation and reporting defaults.

  3. Sullivan, G. M., & Feinn, R. (2012). Using effect size — or why the p-value is not enough. Journal of Graduate Medical Education, 4(3), 279–282.

    → Clinician-facing argument that statistical significance alone cannot convey practical importance; directly motivates effect size display in Licklider's output panel.

See also