Power Analysis and Sample Size

How to calculate the sample size your experiment needs, what Licklider computes, why post-hoc power is not supported, and where the current support boundary sits.

A power analysis answers one question before you collect data: how many observations do you need to have a reasonable chance of detecting the effect you care about, if it exists?

Running this calculation before an experiment is one of the most effective steps a researcher can take to protect the validity of their results. Underpowered studies miss real effects. They also waste resources and contribute to the replication crisis in ways that are difficult to correct after the fact.


The core quantities — and design parameters

Every power calculation involves four core quantities:

QuantitySymbolTypical value
Significance thresholdalpha0.05
Statistical power1 − beta0.80
Effect sized, f, or wdepends on field and design
Sample sizenwhat you are solving for

Fix any three, and the fourth is determined.

However, most designs require additional parameters that also affect the required sample size:

  • Number of groups — for one-way ANOVA, adding groups changes the required n per group even at the same effect size and power
  • Allocation ratio — for independent-samples t-tests, unequal group sizes change the effective sample size
  • Number of cells or bins — for chi-square tests, the degrees of freedom depend on the number of categories, which affects power

Licklider uses the values you specify for these design parameters. The required n it returns is conditional on those assumptions — changing them will change the result.

The defaults in Licklider are alpha = 0.05 and target power = 0.80. These are common planning conventions, not universal scientific truths. They are useful starting points because they are widely recognized and make studies easier to compare, but some designs justify stricter alpha, higher power, or both.

When the analysis context already determines the method, Licklider does not ask you to choose a redundant test type. In chi-square power planning, for example, the panel infers the chi-square context from the selected contingency-table variables and proceeds directly to effect size, alpha, and target power.


What Licklider calculates

Power analysis is available for the following tests:

TestEffect size inputDesign parametersOutput
Independent-samples t-testCohen's dAllocation ratio (default: equal groups)Required n per group
Paired t-testCohen's d (paired)Required n (pairs)
One-way ANOVACohen's fNumber of groupsRequired n per group
Chi-square testCohen's wNumber of bins / cellsRequired total n
CorrelationPearson rRequired total n
Linear regressionCohen's f<sup>2</sup>Number of predictorsRequired total n
Repeated-measures ANOVAeta<sup>2</sup>Number of measurements, within-subject correlation, epsilonRequired total n

All t-test calculations assume a two-sided test. One-sided tests are not currently supported.

This is a conservative default. Two-sided planning is the safer general choice unless the direction of the effect is genuinely fixed before data collection and a one-sided design can be defended in the protocol.

A note on chi-square effect size: The standard input for chi-square power calculations is Cohen's w, not Cramer's V. If you have a Cramer's V from prior literature, you can convert it: w = V x sqrt(min(rows, columns) - 1). The result depends on table dimensions, so always verify the conversion against the specific contingency table structure you are working with.

Survival designs are supported with constraints: Licklider's current power contract includes log-rank and Cox sample-size planning. These workflows require survival-specific inputs such as a hazard-ratio or median-survival mode and should still be checked carefully against the assumptions of the planned follow-up and event process.


How to run a calculation

Open the Power panel for any supported analysis. Two paths are available:

Provide an effect size directly Enter a Cohen's d, f, or w value based on prior literature, Pearson r, f<sup>2</sup>, or eta<sup>2</sup> value based on prior literature, a pilot study, or a defined minimal effect of interest. Specify the relevant design parameters — number of groups for ANOVA, number of bins for chi-square, allocation ratio for unequal-group t-tests, or repeated-measures settings when applicable. Licklider returns the required sample size and the achieved power at that n.

Use your existing data If you have pilot data or a preliminary dataset loaded, Licklider estimates the effect size from your data and uses it as the input. Design parameters are inferred from the data structure where possible. Be aware that effect sizes estimated from small samples are unstable — treat the result as a rough guide, not a precise target.

Important: Licklider can calculate from the effect size you enter or estimate, but it does not automatically know whether that effect size is optimistic, whether your observations are truly independent, whether clustering or repeated measures should change the design, or whether you should inflate the final n to account for dropouts, assay failure, or missingness. Those are study-design decisions, not panel settings the software can safely infer from the table alone.

Default values: alpha = 0.05, target power = 0.80. Both can be adjusted in the panel.


Choosing an effect size

This is the step most researchers find difficult, and it is the most consequential. A few approaches:

From prior literature Search for studies using the same outcome measure and experimental design. Use the reported effect size as a starting point. Verify that the effect size metric matches what Licklider expects for your test — Cohen's d, f, or w — and that the design parameters (number of groups, table structure) are comparable.

From a minimal effect of interest Define the smallest difference that would be scientifically or practically meaningful, then convert to a standardized effect size. This approach is more defensible than using whatever the literature happens to report. → See Minimal Effect of Interest for how to define and use this.

Cohen's conventional benchmarks Cohen (1988) proposed small (d = 0.2), medium (d = 0.5), and large (d = 0.8) as rough reference points for behavioral research. These are widely misapplied in life sciences, where effect sizes are highly domain-specific. Use them as orientation only, not as substitutes for a field-informed estimate.

If you only remember one rule: an unrealistic effect size assumption is the fastest way to make a sample-size plan look reassuring while still leaving the study underpowered in practice.


Interpreting the result

Licklider returns two numbers:

  • Required n — the sample size needed (per group, per pair, or total, depending on the test and design) to achieve the target power at the specified effect size, alpha, and design parameters
  • Achieved power — the actual power at the required n (typically slightly above the target due to rounding)

The result is conditional on all inputs. If any assumption changes — the number of groups, the allocation ratio, the expected effect size — rerun the calculation.

If the required n is larger than what your experiment can realistically achieve, the options are:

  1. Accept lower power and acknowledge it as a limitation
  2. Revise the minimal effect of interest upward
  3. Redesign the study to reduce variance (e.g., paired design, better control of confounders)
  4. Consider a pilot study to refine the effect size estimate

The returned n is usually the analytic minimum for the model you chose. If you expect attrition, excluded samples, failed assays, or incomplete pairs, plan above that minimum rather than treating the displayed n as the final enrollment target.


Design rationale and references

Licklider uses the standard planning framework of alpha, power, effect size, and sample size because those four quantities define the basic trade-off of confirmatory study planning. The additional design inputs shown in the panel — such as number of groups, allocation ratio, and table size — are exposed because they materially change the required n even when the nominal effect size stays the same [1, 2].

The defaults of alpha = 0.05 and power = 0.80 are common conventions in applied research, included as starting values rather than as claims of universal optimality [1, 3]. Licklider lets you adjust both because some studies need tighter false-positive control, stronger power, or both.

Licklider does not offer post-hoc power because observed-power calculations are largely a re-expression of the same p-value and add little interpretive value after the data are already observed [4].

The workflow emphasizes minimal effect of interest and confidence interval thinking because those are usually more decision-relevant than retrospective statements about whether an already observed result was "adequately powered" [2, 4].

Methodological foundations

  1. Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates. -> Canonical reference for alpha, power, and standardized effect-size conventions used in many planning workflows.

  2. Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of Experimental Social Psychology, 74, 187-195. -> Explains why effect sizes estimated from small pilot samples can mislead sample-size planning.

  3. Biau, D. J., Kernéis, S., & Porcher, R. (2008). Statistics in brief: The importance of sample size in the planning and interpretation of medical research. Clinical Orthopaedics and Related Research, 466(9), 2282-2288. -> Overview of why planning conventions such as alpha and target power matter, and why sample-size calculations are conditional on design assumptions.

  4. Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55(1), 19-24. -> Direct basis for Licklider's decision not to support post-hoc power from observed effects.


Post-hoc power is not supported

Licklider does not calculate post-hoc power — specifically, power estimated after data collection using the observed effect size from the analyzed data itself.

This is a deliberate decision.

Post-hoc power calculated from the observed effect size is mathematically determined by the p-value of the same test. A non-significant result will always produce low post-hoc power. The calculation adds no information beyond what the p-value already contains, and routinely leads to the circular conclusion that "the result was non-significant, therefore the study was underpowered" — which says nothing about whether the effect is real (Hoenig & Heisey, 2001, The American Statistician).

If your experiment has already been run and you want to assess what the data can and cannot tell you, use the confidence interval.

The confidence interval shows the range of effect sizes compatible with your data. If the interval is wide, the data are imprecise — regardless of the p-value. If the interval excludes effect sizes you would consider meaningful, the data provide reasonable evidence against an effect of that magnitude. This is the correct question to ask after the fact.

Effect size and confidence interval are reported automatically for all supported analyses in Licklider.


Current support boundary

  • Licklider's sample-size outputs assume the design model you selected is the correct one for the study; the panel does not automatically detect clustering, repeated measurements, matched structures, or pseudoreplication.
  • Licklider does not automatically judge whether an effect size taken from pilot data or prior literature is inflated, optimistic, or scientifically meaningful for your own assay.
  • Licklider does not automatically add margin for dropout, failed measurements, unusable samples, or other practical losses between enrollment and final analysis.
  • The supported calculations on this page are limited to independent-samples t-test, paired t-test, one-way ANOVA, and chi-square planning.
  • Correlation, linear regression, and survival sample-size planning are not currently covered here and should use design-specific external methods.

What this page does not cover