Linear Regression (OLS)

How to run ordinary least squares regression in Licklider, what the results include, and how the model is visualized.

Linear regression models the relationship between a continuous outcome variable and one or more predictor variables. Licklider fits ordinary least squares (OLS) regression using a standard implementation and returns the model coefficients, standard errors, p-values, confidence intervals, and fit statistics.


When to use linear regression

Linear regression is appropriate when:

  • The outcome variable is continuous and unbounded
  • You want to estimate how the outcome changes with one or more predictors
  • The relationship between predictors and outcome is approximately linear

If the outcome is binary or a proportion, logistic regression is more appropriate → see Logistic Regression and AUC/ROC.

If the relationship is non-linear — for example, a sigmoidal dose-response curve — non-linear regression is more appropriate → see Non-linear Regression and IC50/4PL.


How to request it

Describe the analysis in the Chat. For example:

  • "Run a linear regression of body weight on dose"
  • "Regress gene expression on treatment intensity and age"
  • "Show the relationship between X and Y with a regression line"

Licklider will fit the model and display the results.


Fit results vs diagnostics

The regression fit and the regression diagnostics claim are separate visible states. When a model fit succeeds, Licklider may still show the diagnostics claim as unresolved, manual-review required, stale, fallback-based, or descriptive-only. Those states mean the coefficient table is available, but the diagnostics have not been cleared for claim-bearing use.

For OLS fits, the current diagnostics projection is summary-only. It can report residual normality, homoskedasticity, residual independence, multicollinearity through VIF, and influence summaries. These checks are displayed as diagnostic facts and caveats, not as automatic model repair.

Robust standard errors are shown separately as alternate evidence. Licklider computes HC0, HC1, HC2, and HC3 summaries, with HC3 shown as the default alternate. This does not replace the ordinary OLS coefficient table: the primary standard errors, p-values, confidence intervals, and fit statistics remain the ordinary OLS results unless a future workflow explicitly adopts a different method.

Exports preserve the same Core projection used by the panel. Before durable governance is introduced, this projection is ephemeral figure-level evidence rather than DB-backed audit evidence.


What the results include

Coefficient table

One row per predictor, including the intercept. Each row shows:

  • Estimate — the regression coefficient
  • Standard error
  • t-statistic
  • p-value
  • 95% confidence interval (lower and upper bounds)

Robust SE alternate evidence

When robust standard error evidence is available, it is displayed as advisory parallel evidence. It can help compare the ordinary OLS inference with HC0-HC3 heteroskedasticity-consistent alternatives, but it is not an automatic switch to robust inference and should not be described as adopted or executed primary inference.

Model fit statistics

  • R² — the proportion of variance in the outcome explained by the model
  • Adjusted R² — R² penalized for the number of predictors
  • F-statistic and its p-value — the overall test of whether any predictor explains the outcome
  • Residual standard error

Correlation

When a linear regression is run on a scatter or regression chart, Licklider automatically calculates both Pearson and Spearman correlation coefficients. The results appear in the Correlation panel of the Inspector, alongside the regression output.

The primary correlation is selected based on the normality of the regression residuals:

  • If residuals are normal (Shapiro-Wilk p > 0.05): Pearson is primary
  • If residuals are non-normal: Spearman is primary

Both coefficients are always reported. The primary designation indicates which is statistically appropriate given the data. For a full discussion, see Correlation Analysis.


Visualization

When a linear regression is run on a two-variable scatter plot, Licklider automatically overlays:

  • The fitted regression line
  • A 95% confidence band around the mean response

The confidence band reflects uncertainty in the estimated mean, not the spread of individual observations around the line.

This band is intentionally a confidence band for the estimated mean response rather than a prediction interval for individual future observations. That choice keeps the default figure aligned with the fitted regression line itself: it shows how uncertain the estimated mean trend is, without implying that the band represents the full spread of individual points.


Multiple predictors

Linear regression with more than one predictor is supported. Each predictor's coefficient represents its estimated effect on the outcome holding all other predictors constant.

When multiple predictors are included, Licklider evaluates the predictor structure for potential issues — including collinearity and sample size adequacy — before allowing claim-bearing output. This guard is meant to catch common structural problems that make regression coefficients unstable or hard to interpret, especially when the model is too complex for the available sample or when predictors overlap heavily in what they measure. For more detail → see Regression Diagnostics Guard.

The guard does not certify that the model is fully valid. It focuses on predictor structure, not on whether the relationship is truly linear, whether residual variance is constant, whether influential points dominate the fit, or whether clustered observations violate independence.


Assumptions

Linear regression assumes:

  • The outcome variable is continuous
  • The relationship between predictors and outcome is linear
  • Residuals are approximately normally distributed
  • Residuals have roughly constant variance (homoscedasticity)
  • Observations are independent

Licklider emits summary diagnostics for OLS models:

  • Shapiro-Wilk residual normality
  • Breusch-Pagan homoskedasticity
  • Durbin-Watson residual independence, with a row-order caveat
  • Variance inflation factor for multicollinearity
  • Influence summary counts based on leverage and Cook's distance thresholds

These summaries do not include row-level residuals, leverage values, Cook's D values, or DFBETAS arrays. Review any warning, failed, or manual-review state before treating the model as claim-bearing output.

These checks reduce common mistakes, but they do not validate your study design for you. In particular, Licklider does not automatically determine whether rows that look separate are actually repeated measurements from the same subject, animal, plate, well, batch, or cluster. If that structure is hidden in the table, ordinary OLS can report coefficients, standard errors, and p-values that look more certain than they should.

Licklider also does not automatically prove that the predictor-outcome relationship is linear, that residuals are well-behaved across the full range of fitted values, or that the result is not being driven by a small number of influential points. Those are model-checking questions, not guarantees of the basic OLS fit.

If your data are clustered, repeatedly measured, strongly non-linear, or visibly heteroscedastic, pause before interpreting the OLS result and review Repeated Measures and Mixed Models, the Regression Diagnostics Guard, and the relevant diagnostic plots.

Design rationale and references

Licklider shows coefficients, confidence intervals, and fit statistics because regression is usually used to support directional scientific claims, not just to summarize association. Reporting uncertainty around each coefficient helps readers judge both magnitude and precision rather than focusing on p-values alone.

Licklider also computes Pearson and Spearman correlations alongside the regression output as descriptive companions, not as replacements for the fitted model. This gives new users a quick read on simple association while keeping the regression coefficient table as the main inferential result.

For multi-predictor models, Licklider separates predictor-structure checks from residual diagnostics because overfitting and collinearity are common failure modes before interpretation begins, especially when the model includes many overlapping predictors relative to sample size. That is why the guard can block claim-bearing output for structural problems while still leaving residual diagnostics as a separate interpretive step.

Exact p-values are reported as numeric values rather than threshold labels so readers can interpret evidence in context rather than treating a cutoff as a binary pass-fail rule.

References

  1. Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411-421. https://doi.org/10.1097/01.psy.0000127692.23278.a9
  2. Dormann, C. F., Elith, J., Bacher, S., et al. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46. https://doi.org/10.1111/j.1600-0587.2012.07348.x
  3. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133. https://doi.org/10.1080/00031305.2016.1154108

What this page does not cover