Scatter Plot

When to use a scatter plot, how to add a regression line and confidence band, and what the Inspector shows.

A scatter plot displays the relationship between two continuous variables. Each observation is represented as a point at the coordinates of its x and y values.

Scatter plots are most useful for exploring whether a relationship exists between two variables, how strong it is, and whether the relationship looks linear. They are not, by themselves, evidence of causation.


Basic setup

To create a scatter plot, specify:

  • x-axis variable — the predictor or independent variable
  • y-axis variable — the outcome or dependent variable

If your dataset contains a group column, points are colored by group automatically, with each group appearing as a separate trace in the legend.


Regression line and confidence band

A linear regression line can be overlaid on the scatter plot. When enabled, Licklider adds:

  • A fitted line from the OLS (ordinary least squares) regression of y on x
  • A 95% confidence band showing the uncertainty around the fitted line at each x value (the 95% level follows the conventional threshold used in most life-science reporting; if your pre-registration or analysis plan specifies a different level, tell Licklider in the Chat)

The confidence band is calculated as:

y-hat ± t(n−2, 0.025) × SE_y-hat

where SE_y-hat accounts for both residual variance and the distance of each x point from the mean of x. The band is wider at the extremes of the x range and narrower near the center — this is expected behavior, not an error.

When n < 3, the confidence band cannot be calculated and the fitted line is shown alone. (The t-distribution used in the formula requires n−2 degrees of freedom; at n = 2 this reaches zero, and below that it is undefined.)

The regression line describes the linear trend in your data. It does not establish that x causes y. If your analysis involves causal claims, Licklider will flag language in your figure caption that implies causation where only association has been established.

Reference: Altman, N., & Krzywinski, M. (2015). Simple linear regression. Nature Methods, 12(11), 999–1000. https://doi.org/10.1038/nmeth.3627 — Confidence band derivation and interpretation for OLS regression.


What the Inspector shows

When a scatter plot is active with a regression line enabled, the Inspector displays:

StatisticDescription
nNumber of observations used in the regression
Proportion of variance in y explained by x
SlopeRegression coefficient for x
InterceptFitted value of y when x = 0
Residual SEStandard error of the residuals

These values are always shown when a regression line is active, regardless of whether a full regression analysis has been run separately.


Point labels

Individual data points can be labeled with values from any column in your dataset — for example, a subject ID, sample name, or animal number. This is useful for identifying specific observations, particularly outliers.

Point labels can be set in the Inspector or via Chat. When n > 50, labels will overlap and readability is reduced — a note appears in the Inspector in this case.


Display controls

The following can be adjusted from the Inspector or Chat:

ControlOptions
Regression lineOn / Off
Confidence bandShown when regression line is on
Group coloringAutomatic when group column is specified
Point labelsSelect a column or None
x-axis labelEditable directly in Inspector
y-axis labelEditable directly in Inspector
Axis scaleLinear / Log (x and y independently)
Axis rangeMin and max (x and y independently)

Correlation is not causation

Scatter plots are the figure type most often used to imply causal relationships that have not been established.

A regression line shows that x and y are linearly associated in your data. It does not show that changing x will change y. Confounders, reverse causation, and coincidental correlation are all consistent with a strong R².

Licklider checks the language used in your figure captions and analysis descriptions. If causal language appears in a scatter plot context — for example, "x increases y" or "x drives y" — a warning will appear asking you to confirm the intended interpretation.

This check is a language-level prompt, not a statistical detection system. Licklider cannot detect confounders, high-leverage outliers, or violations of OLS assumptions (linearity, constant variance, independence) from the data alone. If these are concerns in your analysis, run a full regression analysis and inspect the residual diagnostics before drawing conclusions.

Reference: Hernán, M.A., & Robins, J.M. (2020). Causal Inference: What If (ch. 1). Chapman & Hall/CRC. — Framework for distinguishing association from causal effect; freely available at https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/


Large datasets

For datasets with 100,000 or more observations, Licklider automatically switches to a WebGL-accelerated renderer (scattergl) to maintain performance. The visual output is identical.

For smaller datasets with many overlapping points, consider using a Density Contour (2D) plot instead, which shows the distribution of point density rather than individual points.


When to use a scatter plot

Scatter plots work well when:

  • You are exploring the relationship between two continuous variables
  • You want to visualize the fit of a linear regression
  • You have labeled observations you want to identify individually
  • n is large enough that individual points are informative

Consider alternatives when:

  • You have many overlapping points at the same coordinates — use Density Contour (2D)
  • You want to show the full regression analysis with residual diagnostics — run a Linear Regression analysis and use the Regression Plot
  • Your relationship is nonlinear — consider a nonlinear regression with an appropriate model

What this page does not cover