Scatter Plot
When to use a scatter plot, how to add a regression line and confidence band, and what the Inspector shows.
A scatter plot displays the relationship between two continuous variables. Each observation is represented as a point at the coordinates of its x and y values.
Scatter plots are most useful for exploring whether a relationship exists between two variables, how strong it is, and whether the relationship looks linear. They are not, by themselves, evidence of causation.
Basic setup
To create a scatter plot, specify:
- x-axis variable — the predictor or independent variable
- y-axis variable — the outcome or dependent variable
If your dataset contains a group column, points are colored by group automatically, with each group appearing as a separate trace in the legend.
Regression line and confidence band
A linear regression line can be overlaid on the scatter plot. When enabled, Licklider adds:
- A fitted line from the OLS (ordinary least squares) regression of y on x
- A 95% confidence band showing the uncertainty around the fitted line at each x value (the 95% level follows the conventional threshold used in most life-science reporting; if your pre-registration or analysis plan specifies a different level, tell Licklider in the Chat)
The confidence band is calculated as:
y-hat ± t(n−2, 0.025) × SE_y-hat
where SE_y-hat accounts for both residual variance and the distance of each x point from the mean of x. The band is wider at the extremes of the x range and narrower near the center — this is expected behavior, not an error.
When n < 3, the confidence band cannot be calculated and the fitted line is shown alone. (The t-distribution used in the formula requires n−2 degrees of freedom; at n = 2 this reaches zero, and below that it is undefined.)
The regression line describes the linear trend in your data. It does not establish that x causes y. If your analysis involves causal claims, Licklider will flag language in your figure caption that implies causation where only association has been established.
Reference: Altman, N., & Krzywinski, M. (2015). Simple linear regression. Nature Methods, 12(11), 999–1000. https://doi.org/10.1038/nmeth.3627 — Confidence band derivation and interpretation for OLS regression.
What the Inspector shows
When a scatter plot is active with a regression line enabled, the Inspector displays:
| Statistic | Description |
|---|---|
| n | Number of observations used in the regression |
| R² | Proportion of variance in y explained by x |
| Slope | Regression coefficient for x |
| Intercept | Fitted value of y when x = 0 |
| Residual SE | Standard error of the residuals |
These values are always shown when a regression line is active, regardless of whether a full regression analysis has been run separately.
Point labels
Individual data points can be labeled with values from any column in your dataset — for example, a subject ID, sample name, or animal number. This is useful for identifying specific observations, particularly outliers.
Point labels can be set in the Inspector or via Chat. When n > 50, labels will overlap and readability is reduced — a note appears in the Inspector in this case.
Display controls
The following can be adjusted from the Inspector or Chat:
| Control | Options |
|---|---|
| Regression line | On / Off |
| Confidence band | Shown when regression line is on |
| Group coloring | Automatic when group column is specified |
| Point labels | Select a column or None |
| x-axis label | Editable directly in Inspector |
| y-axis label | Editable directly in Inspector |
| Axis scale | Linear / Log (x and y independently) |
| Axis range | Min and max (x and y independently) |
Correlation is not causation
Scatter plots are the figure type most often used to imply causal relationships that have not been established.
A regression line shows that x and y are linearly associated in your data. It does not show that changing x will change y. Confounders, reverse causation, and coincidental correlation are all consistent with a strong R².
Licklider checks the language used in your figure captions and analysis descriptions. If causal language appears in a scatter plot context — for example, "x increases y" or "x drives y" — a warning will appear asking you to confirm the intended interpretation.
This check is a language-level prompt, not a statistical detection system. Licklider cannot detect confounders, high-leverage outliers, or violations of OLS assumptions (linearity, constant variance, independence) from the data alone. If these are concerns in your analysis, run a full regression analysis and inspect the residual diagnostics before drawing conclusions.
Reference: Hernán, M.A., & Robins, J.M. (2020). Causal Inference: What If (ch. 1). Chapman & Hall/CRC. — Framework for distinguishing association from causal effect; freely available at https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Large datasets
For datasets with 100,000 or more observations, Licklider automatically switches to a WebGL-accelerated renderer (scattergl) to maintain performance. The visual output is identical.
For smaller datasets with many overlapping points, consider using a Density Contour (2D) plot instead, which shows the distribution of point density rather than individual points.
When to use a scatter plot
Scatter plots work well when:
- You are exploring the relationship between two continuous variables
- You want to visualize the fit of a linear regression
- You have labeled observations you want to identify individually
- n is large enough that individual points are informative
Consider alternatives when:
- You have many overlapping points at the same coordinates — use Density Contour (2D)
- You want to show the full regression analysis with residual diagnostics — run a Linear Regression analysis and use the Regression Plot
- Your relationship is nonlinear — consider a nonlinear regression with an appropriate model
What this page does not cover
- Full regression analysis with coefficient tables and diagnostics → see Linear Regression (OLS)
- Nonlinear curve fitting → see Non-linear Regression and IC50/4PL
- Regression diagnostic plots → see Residual Plot
- Assumption checks for regression → see Regression Diagnostics Guard
- Density-based visualization of point clouds → see Density Contour (2D)