Regression Diagnostics Guard

When a regression model includes multiple predictors, several structural problems can make the model unreliable: too many predictors relative to the available data, predictors that are highly correlated with each other, or predictors whose names suggest they measure the same underlying variable.

Licklider checks the predictor structure of multi-predictor regression models before claim-bearing output is allowed.

What the guard checks

The guard evaluates the following properties of the predictor set:

Sample size relative to predictor count

A regression model requires enough observations to estimate each predictor's coefficient reliably. When the ratio of observations to predictors is low, the model is at risk of overfitting - fitting noise in the sample rather than the underlying relationship.

The thresholds used here are guard heuristics, not universal mathematical cutoffs. Licklider uses them as early warning levels because instability and overfitting risk rise quickly once the model complexity starts to approach the available sample size [1].

A ratio below 5 observations per predictor is treated as high risk
A ratio below 10 observations per predictor is treated as medium risk

Predictor collinearity

When two predictors are highly correlated with each other, the model cannot reliably separate their individual contributions. The coefficients become unstable and standard errors inflate.

Licklider uses pairwise correlation here as a fast, transparent first-pass screen for overlapping predictors. It does not claim that pairwise correlation is a complete collinearity diagnostic, but it is easy to interpret and catches many common cases before users over-read unstable coefficients [2].

Maximum pairwise correlation of 0.90 or above is treated as high risk
Maximum pairwise correlation of 0.85 or above is treated as medium risk

Duplicate or alias predictors

When the predictor set contains columns with identical names, near-identical names, or names that suggest they are transformations of each other, this is flagged as high risk. Including aliases in the same model is almost always a mistake.

This check is intentionally conservative because alias predictors often reflect accidental duplicate columns, derived columns re-entered as if they were independent predictors, or naming collisions that make the model hard to interpret even before formal diagnostics begin.

What the guard does not check

The guard evaluates predictor structure, not residual diagnostics. The following are not checked automatically:

Residual normality and homoscedasticity
Influential points (Cook's distance or leverage)
Variance inflation factors (VIF)

Licklider also does not guarantee detection of every predictor-side problem. This guard does not reliably detect nonlinear predictor-outcome relationships, collinearity that appears only after categorical encoding or interaction expansion, or apparent correlation driven by a small number of extreme rows.

These diagnostics require inspecting the model's fitted values and residuals after the model has been estimated. If you need to evaluate these properties, request them in the Chat or inspect the Residual Plot after the model is fit.

Those limits matter because a model can pass this guard and still be misleading. Unchecked nonlinearity can bias coefficients, influential points can dominate the fit, and residual problems can make p-values and confidence intervals look more trustworthy than they really are. Passing the guard means the predictor set looks structurally safer, not that the full regression assumptions have been validated.

What you are asked to confirm

When the guard detects a structural problem, it presents three options:

Acknowledged - proceed with disclosure

You have reviewed the predictor structure, accept that the identified risk is present, and will disclose it in the methods text. This is the appropriate path when the model structure is intentional and the limitation is acknowledged.

Predictor set reduced

You have removed predictors to address the identified risk - for example, dropping a collinear predictor or reducing the model complexity given the sample size. The analysis will reflect the updated predictor set.

Descriptive only

The structural problem is unresolved. The result will be treated as descriptive and is not eligible for claim-bearing export.

Effect on export

When the guard is unresolved, claim-bearing export is blocked. The Inspector will indicate which aspect of the predictor structure requires confirmation.

When the risk level is low - for example, when collinearity is present but below the threshold - a note is added to the figure's disclosure without requiring confirmation.

This export behavior reflects the role of the page: the guard blocks claim-bearing output only when the predictor structure looks risky enough that the coefficients may be too unstable to interpret without an explicit decision. It is not a certificate that the model is globally valid.

What this page does not cover

Residual plot interpretation -> see Residual Plot
How proportional or bounded outcomes affect regression choice -> see Proportion Data OLS Prevention
Survival regression -> see Cox Proportional Hazards Regression

Design Rationale & References

Licklider's design choices

Licklider places this guard before claim-bearing export because predictor-structure problems can make a regression look precise even when the coefficients are too unstable to support a scientific claim. Too many predictors for the available sample invites overfitting, and heavily overlapping predictors make it difficult to attribute effects cleanly to any one term [1, 2].

The current thresholds are meant as practical warning bands rather than as universal truths. A ratio below 10 observations per predictor and pairwise correlations above roughly 0.85 are treated as signals that the model deserves extra caution; more severe values push the result toward confirmation or descriptive-only handling. Licklider uses these heuristics to keep the guard simple, legible, and conservative for non-specialist users rather than hiding the logic behind a more opaque diagnostic stack.

Licklider also separates this predictor-structure guard from residual diagnostics on purpose. Predictor structure can often be assessed before interpretation, while residual behavior and influence depend on the fit itself and are better treated as a second layer of model checking rather than folded into one pass-fail score.

Methodological foundations

Babyak, M. A. (2004). What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine, 66(3), 411-421.
→ Explains why too many predictors relative to sample size can produce unstable, overfit regression models that appear stronger than they really are.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., García Marquéz, J. R., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46.
→ Reviews the practical consequences of collinearity and supports treating heavy predictor overlap as a pre-interpretation risk.

Implementation boundaries

This guard evaluates predictor structure before claim-bearing interpretation; it does not certify that the fitted regression is fully valid.
Licklider does not automatically detect every source of instability. Nonlinearity, influence, residual pathologies, and some encoded collinearity patterns can still pass through this guard.
The predictor-ratio and pairwise-correlation cutoffs are heuristics used for warning and escalation, not universal scientific constants.
If the model passes this guard, you should still inspect residual diagnostics and consider whether the study design supports the regression assumptions.