Linear Regression (OLS)

Linear regression models the relationship between a continuous outcome variable and one or more predictor variables. Licklider fits ordinary least squares (OLS) regression using a standard implementation and returns the model coefficients, standard errors, p-values, confidence intervals, and fit statistics.

When to use linear regression

Linear regression is appropriate when:

The outcome variable is continuous and unbounded
You want to estimate how the outcome changes with one or more predictors
The relationship between predictors and outcome is approximately linear

If the outcome is binary or a proportion, logistic regression is more appropriate → see Logistic Regression and AUC/ROC.

If the relationship is non-linear — for example, a sigmoidal dose-response curve — non-linear regression is more appropriate → see Non-linear Regression and IC50/4PL.

How to request it

Describe the analysis in the Chat. For example:

"Run a linear regression of body weight on dose"
"Regress gene expression on treatment intensity and age"
"Show the relationship between X and Y with a regression line"

Licklider will fit the model and display the results.

What the results include

Coefficient table

One row per predictor, including the intercept. Each row shows:

Estimate — the regression coefficient
Standard error
t-statistic
p-value
95% confidence interval (lower and upper bounds)

Model fit statistics

R² — the proportion of variance in the outcome explained by the model
Adjusted R² — R² penalized for the number of predictors
F-statistic and its p-value — the overall test of whether any predictor explains the outcome
Residual standard error

Correlation

When a linear regression is run on a scatter or regression chart, Licklider automatically calculates both Pearson and Spearman correlation coefficients. The results appear in the Correlation panel of the Inspector, alongside the regression output.

The primary correlation is selected based on the normality of the regression residuals:

If residuals are normal (Shapiro-Wilk p > 0.05): Pearson is primary
If residuals are non-normal: Spearman is primary

Both coefficients are always reported. The primary designation indicates which is statistically appropriate given the data. For a full discussion, see Correlation Analysis.

Visualization

When a linear regression is run on a two-variable scatter plot, Licklider automatically overlays:

The fitted regression line
A 95% confidence band around the mean response

The confidence band reflects uncertainty in the estimated mean, not the spread of individual observations around the line.

This band is intentionally a confidence band for the estimated mean response rather than a prediction interval for individual future observations. That choice keeps the default figure aligned with the fitted regression line itself: it shows how uncertain the estimated mean trend is, without implying that the band represents the full spread of individual points.

Multiple predictors

Linear regression with more than one predictor is supported. Each predictor's coefficient represents its estimated effect on the outcome holding all other predictors constant.

When multiple predictors are included, Licklider evaluates the predictor structure for potential issues — including collinearity and sample size adequacy — before allowing claim-bearing output. This guard is meant to catch common structural problems that make regression coefficients unstable or hard to interpret, especially when the model is too complex for the available sample or when predictors overlap heavily in what they measure. For more detail → see Regression Diagnostics Guard.

The guard does not certify that the model is fully valid. It focuses on predictor structure, not on whether the relationship is truly linear, whether residual variance is constant, whether influential points dominate the fit, or whether clustered observations violate independence.

Assumptions

Linear regression assumes:

The outcome variable is continuous
The relationship between predictors and outcome is linear
Residuals are approximately normally distributed
Residuals have roughly constant variance (homoscedasticity)
Observations are independent

Licklider checks normality automatically for single-predictor models. For multi-predictor models, the Regression Diagnostics Guard evaluates predictor structure. Residual diagnostics should be inspected separately when the assumptions are in question.

These checks reduce common mistakes, but they do not validate your study design for you. In particular, Licklider does not automatically determine whether rows that look separate are actually repeated measurements from the same subject, animal, plate, well, batch, or cluster. If that structure is hidden in the table, ordinary OLS can report coefficients, standard errors, and p-values that look more certain than they should.

Licklider also does not automatically prove that the predictor-outcome relationship is linear, that residuals are well-behaved across the full range of fitted values, or that the result is not being driven by a small number of influential points. Those are model-checking questions, not guarantees of the basic OLS fit.

If your data are clustered, repeatedly measured, strongly non-linear, or visibly heteroscedastic, pause before interpreting the OLS result and review Repeated Measures and Mixed Models, the Regression Diagnostics Guard, and the relevant diagnostic plots.

Design rationale and references

Licklider shows coefficients, confidence intervals, and fit statistics because regression is usually used to support directional scientific claims, not just to summarize association. Reporting uncertainty around each coefficient helps readers judge both magnitude and precision rather than focusing on p-values alone.

Licklider also computes Pearson and Spearman correlations alongside the regression output as descriptive companions, not as replacements for the fitted model. This gives new users a quick read on simple association while keeping the regression coefficient table as the main inferential result.

For multi-predictor models, Licklider separates predictor-structure checks from residual diagnostics because overfitting and collinearity are common failure modes before interpretation begins, especially when the model includes many overlapping predictors relative to sample size. That is why the guard can block claim-bearing output for structural problems while still leaving residual diagnostics as a separate interpretive step.

Exact p-values are reported as numeric values rather than threshold labels so readers can interpret evidence in context rather than treating a cutoff as a binary pass-fail rule.

References

Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine, 66(3), 411-421. https://doi.org/10.1097/01.psy.0000127692.23278.a9
Dormann, C. F., Elith, J., Bacher, S., et al. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46. https://doi.org/10.1111/j.1600-0587.2012.07348.x
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133. https://doi.org/10.1080/00031305.2016.1154108

What this page does not cover

Binary or bounded outcome regression → see Logistic Regression and AUC/ROC
Non-linear curve fitting → see Non-linear Regression and IC50/4PL
Repeated measures or clustered data → see Repeated Measures and Mixed Models
Predictor structure quality checks → see Regression Diagnostics Guard
Full residual diagnostics and influence analysis → see the diagnostic plot pages linked from Regression Diagnostics Guard