Missing Data and Attrition

Missing data is present in most real datasets. How it is handled — whether observations are dropped, values are imputed, or the missing pattern is disclosed — affects both the validity of the analysis and its transparency.

Licklider detects missing values, provides tools to handle them before analysis, and requires that any imputation is disclosed when claim-bearing output is produced.

What Licklider detects

When a dataset is loaded, Licklider records the total number of missing values, the missing rate per column, and how these change when preprocessing is applied. This information is visible in the Prep panel and in the Preprocessing Audit Log.

At the figure level, Licklider evaluates how missing values affect the analysis N — how many rows are available for the comparison or regression — and whether the reduction from the input N is substantial enough to require disclosure.

Handling missing values

Missing values can be addressed in the Prep panel before analysis. Four imputation methods are available:

Method	Description
Fill with constant	Replace missing values with a fixed value you specify
Fill with mode	Replace with the most frequently occurring value in the column
Fill with median	Replace with the median of non-missing values
Fill with mean	Replace with the mean of non-missing values

Alternatively, rows with missing values on a specified column can be dropped entirely.

These methods are provided as transparent preprocessing options, not as a guarantee that the resulting analysis is statistically optimal. Simple mean, median, mode, or constant imputation can be useful for exploration or for specific operational needs, but they do not solve every missing-data problem and can distort uncertainty or relationships between variables if used without care.

Each action is recorded in the preprocessing log with the method used, the columns affected, and the number of cells imputed. The figure's disclosure text is generated from this record automatically.

Missing data mechanism

Whether missing data is missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) has implications for which analyses are valid and how the missingness should be reported.

Licklider does not automatically determine the missing data mechanism. When a figure's analysis is affected by missing values, Licklider asks you to confirm which mechanism you believe applies:

MCAR assumed — the missing values are unrelated to the data itself or to the outcome. Standard complete-case analysis is generally valid.
MAR assumed — the probability of being missing depends on observed data but not on the missing values themselves. Imputation methods that use observed data can help reduce bias.
MNAR assumed — the missing values are related to the outcome. Standard imputation does not remove the bias; sensitivity analysis or specialized models are needed.

This selection is recorded in the figure's disclosure.

Licklider also cannot determine automatically whether the selected imputation method is statistically appropriate for your specific analysis. A complete-case analysis may discard too much information, and a simple imputation rule may make the data look more certain than it really is if the missingness mechanism is not compatible with that choice.

How missing data relates to attrition

When rows are dropped due to missing values, the reduction is reflected in the sample attrition trail as a decrease in analysis N relative to input N. The attrition trail shows the total number of dropped rows and the fraction relative to the input, broken down by group when group information is available.

For more detail on how the attrition trail is recorded and displayed → see N Disclosure and Attrition Trail.

Disclosure requirements

When imputation has been applied, Licklider requires that it is acknowledged before claim-bearing export is allowed. The disclosure must confirm that the imputation was appropriate and has been reported.

If imputation has not been acknowledged, the Inspector will indicate that the disclosure is unresolved.

The acknowledgment options reflect how the imputation will be reported:

Disclosed — the imputation is described in the methods text
Exploratory only — the imputation was applied for exploration and the result will not be used as a claim

Design Rationale & References

This page follows a simple rule: missing data handling should be visible, reviewable, and tied to an explicit assumption about why the data are missing. That is why Licklider records missingness before and after preprocessing, links row loss to the attrition trail, and requires claim-bearing outputs to disclose when imputation has been applied.

The missingness-mechanism prompt is also intentional. Whether data are MCAR, MAR, or MNAR changes how defensible a complete-case or imputed analysis may be [1, 2]. Licklider does not infer that mechanism from the table alone because the key question is often scientific rather than purely computational: why the values are absent, and what that absence means for the conclusion.

The available imputation options are intentionally simple and transparent. They provide lightweight preprocessing paths that can be described clearly in the audit trail, but they are not presented as a universal substitute for principled missing-data modeling. Requiring disclosure helps prevent a filled-in dataset from being treated as if no uncertainty were added by the missingness process itself.

Rubin, D. B. (1976). Inference and Missing Data. Biometrika, 63(3), 581-592.
Little, R. J. A., & Rubin, D. B. (2019). Statistical Analysis with Missing Data (3rd ed.). Wiley.

What this page does not cover

How imputation actions are applied → see the Prep panel in the dataset view
How the preprocessing record is read → see Preprocessing Audit Log
How sample attrition is tracked and disclosed → see N Disclosure and Attrition Trail