Regression Diagnostics Guard
How Licklider evaluates predictor structure in regression models before claim-bearing output is allowed, and what the guard checks for, including where the current guard has limits.
When a regression model includes multiple predictors, several structural problems can make the model unreliable: too many predictors relative to the available data, predictors that are highly correlated with each other, or predictors whose names suggest they measure the same underlying variable.
Licklider checks the predictor structure of multi-predictor regression models before claim-bearing output is allowed.
What the guard checks
The guard evaluates the following properties of the predictor set:
Sample size relative to predictor count
A regression model requires enough observations to estimate each predictor's coefficient reliably. When the ratio of observations to predictors is low, the model is at risk of overfitting - fitting noise in the sample rather than the underlying relationship.
The thresholds used here are guard heuristics, not universal mathematical cutoffs. Licklider uses them as early warning levels because instability and overfitting risk rise quickly once the model complexity starts to approach the available sample size [1].
- A ratio below 5 observations per predictor is treated as high risk
- A ratio below 10 observations per predictor is treated as medium risk
Predictor collinearity
When two predictors are highly correlated with each other, the model cannot reliably separate their individual contributions. The coefficients become unstable and standard errors inflate.
Licklider uses pairwise correlation here as a fast, transparent first-pass screen for overlapping predictors. It does not claim that pairwise correlation is a complete collinearity diagnostic, but it is easy to interpret and catches many common cases before users over-read unstable coefficients [2].
- Maximum pairwise correlation of 0.90 or above is treated as high risk
- Maximum pairwise correlation of 0.85 or above is treated as medium risk
Duplicate or alias predictors
When the predictor set contains columns with identical names, near-identical names, or names that suggest they are transformations of each other, this is flagged as high risk. Including aliases in the same model is almost always a mistake.
This check is intentionally conservative because alias predictors often reflect accidental duplicate columns, derived columns re-entered as if they were independent predictors, or naming collisions that make the model hard to interpret even before formal diagnostics begin.
What the guard does not check
The guard evaluates predictor structure, not residual diagnostics. The following are not checked automatically:
- Residual normality and homoscedasticity
- Influential points (Cook's distance or leverage)
- Variance inflation factors (VIF)
Licklider also does not guarantee detection of every predictor-side problem. This guard does not reliably detect nonlinear predictor-outcome relationships, collinearity that appears only after categorical encoding or interaction expansion, or apparent correlation driven by a small number of extreme rows.
These diagnostics require inspecting the model's fitted values and residuals after the model has been estimated. If you need to evaluate these properties, request them in the Chat or inspect the Residual Plot after the model is fit.
Those limits matter because a model can pass this guard and still be misleading. Unchecked nonlinearity can bias coefficients, influential points can dominate the fit, and residual problems can make p-values and confidence intervals look more trustworthy than they really are. Passing the guard means the predictor set looks structurally safer, not that the full regression assumptions have been validated.
What you are asked to confirm
When the guard detects a structural problem, it presents three options:
Acknowledged - proceed with disclosure
You have reviewed the predictor structure, accept that the identified risk is present, and will disclose it in the methods text. This is the appropriate path when the model structure is intentional and the limitation is acknowledged.
Predictor set reduced
You have removed predictors to address the identified risk - for example, dropping a collinear predictor or reducing the model complexity given the sample size. The analysis will reflect the updated predictor set.
Descriptive only
The structural problem is unresolved. The result will be treated as descriptive and is not eligible for claim-bearing export.
Effect on export
When the guard is unresolved, claim-bearing export is blocked. The Inspector will indicate which aspect of the predictor structure requires confirmation.
When the risk level is low - for example, when collinearity is present but below the threshold - a note is added to the figure's disclosure without requiring confirmation.
This export behavior reflects the role of the page: the guard blocks claim-bearing output only when the predictor structure looks risky enough that the coefficients may be too unstable to interpret without an explicit decision. It is not a certificate that the model is globally valid.
What this page does not cover
- Residual plot interpretation -> see Residual Plot
- How proportional or bounded outcomes affect regression choice -> see Proportion Data OLS Prevention
- Survival regression -> see Cox Proportional Hazards Regression
Design Rationale & References
Licklider's design choices
Licklider places this guard before claim-bearing export because predictor-structure problems can make a regression look precise even when the coefficients are too unstable to support a scientific claim. Too many predictors for the available sample invites overfitting, and heavily overlapping predictors make it difficult to attribute effects cleanly to any one term [1, 2].
The current thresholds are meant as practical warning bands rather than as universal truths. A ratio below 10 observations per predictor and pairwise correlations above roughly 0.85 are treated as signals that the model deserves extra caution; more severe values push the result toward confirmation or descriptive-only handling. Licklider uses these heuristics to keep the guard simple, legible, and conservative for non-specialist users rather than hiding the logic behind a more opaque diagnostic stack.
Licklider also separates this predictor-structure guard from residual diagnostics on purpose. Predictor structure can often be assessed before interpretation, while residual behavior and influence depend on the fit itself and are better treated as a second layer of model checking rather than folded into one pass-fail score.
Methodological foundations
Babyak, M. A. (2004). What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosomatic Medicine, 66(3), 411-421.
→ Explains why too many predictors relative to sample size can produce unstable, overfit regression models that appear stronger than they really are.
Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., García Marquéz, J. R., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27-46.
→ Reviews the practical consequences of collinearity and supports treating heavy predictor overlap as a pre-interpretation risk.
Implementation boundaries
- This guard evaluates predictor structure before claim-bearing interpretation; it does not certify that the fitted regression is fully valid.
- Licklider does not automatically detect every source of instability. Nonlinearity, influence, residual pathologies, and some encoded collinearity patterns can still pass through this guard.
- The predictor-ratio and pairwise-correlation cutoffs are heuristics used for warning and escalation, not universal scientific constants.
- If the model passes this guard, you should still inspect residual diagnostics and consider whether the study design supports the regression assumptions.