Proportion Data OLS Prevention
Why ordinary linear regression is not appropriate for proportion or binary outcomes, and how Licklider detects and responds to this situation.
Ordinary least squares regression (OLS) assumes that the outcome variable is continuous and unbounded. When the outcome is a proportion - values between 0 and 1, or percentages - or a binary variable with only two possible values, this assumption is violated. OLS applied to bounded outcomes can produce predicted values outside the valid range and will give misleading standard errors.
When Licklider detects that a regression is being applied to a proportion or binary outcome, it surfaces a confirmation before the result can be used in a claim-bearing context.
What triggers the check
The check runs when all of the following are true:
- A regression analysis is requested
- The response variable is identified as a proportion or bounded response - for example, values consistently between 0 and 1, or a column with a name suggesting a percentage, fraction, or rate
- The requested model is not logistic regression
If the response column is identified as binary (exactly two unique values), the same check applies.
Licklider cannot determine automatically whether a bounded-looking outcome is truly a proportion, a binary endpoint, a continuous ratio that only happens to lie between 0 and 1 in this dataset, or an already-aggregated summary that should be modeled differently.
These limits matter because the same numeric range can arise from very different scientific data-generating processes. If that distinction is wrong, the guard may either block a reasonable OLS analysis or, more importantly, allow an override that leaves a claim on a poorly matched model.
What you are asked to confirm
When the check fires, Licklider presents three options:
Descriptive only
The regression will be shown as a descriptive result and is not eligible for inferential claims. Use this when you want to visualize the relationship without making a formal statistical claim.
The result remains visible as a descriptive output, but it is not eligible for claim-bearing use.
Continuous response confirmed
You confirm that the outcome is genuinely continuous despite appearing bounded - for example, a ratio that happens to fall between 0 and 1 in this dataset but is not constrained to that range by its nature. This resolves the check and allows the OLS result to be used in a claim-bearing context with a disclosure.
This choice records that the bounded appearance was reviewed and treated as compatible with a continuous-response model.
Exploratory OLS override
You acknowledge that OLS is being applied to a bounded outcome for exploratory purposes and will not present the result as a confirmed inferential finding. The result is treated as exploratory.
This keeps the result available for exploration while making clear that the model choice has not been accepted as a claim-bearing default.
The appropriate alternative
For proportion outcomes bounded between 0 and 1, logistic regression on the raw binary observations or a beta regression model is generally more appropriate. For binary outcomes, logistic regression is the standard choice.
To switch to logistic regression, request it in the Chat:
- "Use logistic regression for this outcome"
- "Fit a logistic model"
For more detail on logistic regression -> see Logistic Regression and AUC/ROC.
Design Rationale & References
This page follows a simple rule: the scale and support of the outcome variable should constrain the model used to make an inferential claim. That is why Licklider interrupts OLS when the response appears bounded or binary, offers a descriptive or exploratory path for visualization, and asks for explicit confirmation before allowing a claim-bearing override.
The guard exists because ordinary linear regression can predict impossible values outside the valid range and can misrepresent uncertainty when applied to binary or bounded responses. Logistic regression is the standard default for binary outcomes, and proportion-specific models such as beta regression are typically better aligned with outcomes that are intrinsically bounded between 0 and 1 [1, 2].
The explicit "continuous response confirmed" path is also intentional. Some variables are ratio-like or otherwise continuous in nature even if the observed dataset happens to fall within 0 and 1. In those cases, a blanket block would be too rigid. The confirmation step keeps that exception possible while forcing the reasoning into the disclosure trail.
- Warton, D. I., & Hui, F. K. C. (2011). The arcsine is asinine: the analysis of proportions in ecology. Ecology, 92(1), 3-10. https://doi.org/10.1890/10-0340.1
- Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta Regression for Modelling Rates and Proportions. Journal of Applied Statistics, 31(7), 799-815. https://doi.org/10.1080/0266476042000214501
What this page does not cover
- Logistic regression setup and interpretation -> see Logistic Regression and AUC/ROC
- Compositional data with multiple components -> see Compositional Data Warning
- Outcome type detection and inference -> see Outcome Type and Analysis Intent