Project Statistical Policy and Consistency Audit
How Licklider evaluates a project's effective statistical policy, compares stored figure evidence against that live policy, and summarizes cross-figure consistency risks.
Licklider can review a project's statistical consistency at the project level. It does this by evaluating an effective project policy, comparing stored figure evidence against that live policy, and summarizing cross-figure consistency issues in the Project Audit panel.
This page explains what the policy means, what the current audit evaluates, how policy evidence is stored with figures, and where the current support boundary sits.
What is a project statistical policy?
A project statistical policy is a structured ruleset that defines how a project is expected to handle core statistical decisions. It includes defaults for alpha, sidedness, assumption handling, declared outlier method, multiple- comparison settings, and reporting expectations.
Every audit is evaluated against an effective policy:
- If the project has an explicit stored policy and it can be read successfully, that policy is used.
- If the project has no explicit stored policy, Licklider falls back to the built-in default policy.
The policy model is broader than the current audit surface. The current audit focuses on the parts of the policy that are already captured reliably in figure evidence and figure-level metadata.
Default policy
When no explicit project policy is stored, Licklider uses the following default values:
| Setting | Default |
|---|---|
| Schema version | 1 |
Significance level (alpha.default) | 0.05 |
| Alpha exceptions allowed | true |
| Default sidedness | two_sided |
| One-sided requires justification | true |
| Normality check | always |
| Variance inequality action | welch |
| Declared outlier method | iqr_1.5 |
| Same outlier rule within dataset family | true |
| Multiple-comparison correction method | holm |
| Multiple-comparison family scope | per_figure |
| Effect size required | true |
| Confidence interval required | true |
| Exact p-values required | true |
| Same exclusion within dataset family | true |
| Same preprocessing within dataset family | true |
These defaults are intentionally conservative. If your project has not stored a custom policy, the audit still runs against a known baseline instead of leaving the project's statistical expectations undefined.
How the Project Audit works
The Project Audit is derived on demand. Licklider does not persist a saved audit report and then reuse it later. Each time the Project Audit panel is opened or refreshed, Licklider rebuilds the report against the current effective policy and the currently collected figure evidence.
The model has three layers:
Evidence
When a figure is generated, Licklider stores policy evidence with that figure's metadata. This captures the policy state that was in effect when the figure was created and preserves the basis on which that figure was originally produced.
Derived
When the Project Audit runs, Licklider compares stored figure evidence and figure-level reporting signals against the current effective policy. This derived pass produces:
- a summary
- cross-figure consistency checks
- per-figure compliance results
- an approximate multiple-testing burden estimate
- disclosure suggestions
- follow-up suggestions
These derived results are recalculated each time the audit is requested.
Exception
An exception workflow is planned but not yet implemented. The intended future design is that a deliberate policy deviation can be acknowledged with a reason and tied to the policy version under which it was approved.
If an explicit stored project policy exists but is unreadable, the audit does not silently continue with a different policy. Instead, the audit is blocked until the project policy is reconfigured.
What the current audit evaluates
The current Project Audit panel is organized into several sections.
Summary
The summary reports high-level counts such as:
- total figures
- total figures with a primary statistical test
- compliance rate
- critical deviations
- warnings
- approximate estimated FWER
It also surfaces evidence-state counts, including figures without policy evidence, figures with malformed evidence, figures with unusable evidence, and figures with stale policy snapshots.
Consistency checks
The current audit runs four cross-figure checks:
Alpha Consistency
Checks whether comparable figures expose an alpha value and, when comparable alpha values exist, whether those values match policy.alpha.default. If alpha exceptions are allowed, mismatches are surfaced as warnings rather than hard violations.
Test Selection Consistency
Checks whether figures with the same comparable grouping signals use the same primary test choice. The current v1 grouping basis compares group count, pairing state, normality outcome, and variance outcome before asking whether the selected test names diverge.
Reporting Completeness
Checks whether figures with a primary test include effect size and confidence interval reporting when the live policy requires them.
Declared Outlier Method Consistency
Checks whether figures in the same dataset family declare the same outlier method when the policy requires a shared rule within that family.
Figure compliance
Each figure is also checked individually against the current effective policy. The current v1 figure-compliance surface checks four fields:
- alpha
- declared outlier method
- effect size presence
- confidence interval presence
If a field cannot be evaluated because the necessary signal is missing, it is counted as unchecked rather than being treated as a violation.
Multiple Testing Burden
The multiple-testing burden section is informational. It does not rewrite p-values or automatically apply a correction.
The current v1 estimate is a project-wide approximate FWER based on the count of figures that expose a primary test. The estimate uses the effective policy's default alpha:
1 - (1 - alpha)^n
where n is the number of figures with a primary test.
The policy's family_scope is recorded in the report, but it does not yet change the v1 burden calculation.
Required Disclosures
The audit can generate disclosure text that should be considered for the Methods or reporting sections of a paper. These disclosures are derived from the current report state and can be copied from the Project Audit panel.
Suggestions
The audit also provides follow-up suggestions, especially for stale policy snapshots, missing policy evidence, malformed evidence, and other situations where the project should be reviewed even if the issue is not represented as a single hard violation.
Policy evidence and stale snapshots
Policy evidence matters because projects evolve over time. When a figure is created, Licklider stores the policy basis that was active at generation time. If the effective project policy changes later, that does not silently rewrite the old figure.
Instead, the audit compares the current policy hash with the stored policy hash in the figure's evidence. If they differ, the figure is counted as having a stale policy snapshot.
This does not automatically mean the figure is wrong. It means the figure was created under a different policy state than the one the project currently advertises.
For the broader provenance model around dataset revisions, version snapshots, and reproducibility bundles, see Versioning and Provenance and Reproducibility Package.
Dataset families
Some project-level consistency checks work at the level of a dataset family rather than at the level of the whole project.
In the current implementation, a dataset family is built from the figure's source_dataset_artifact_id. In practical terms, figures that point back to the same source dataset artifact are treated as belonging to the same family.
This matters most for declared outlier-method consistency, because the current policy can require the same rule within a dataset family while allowing different families to use different declared methods.
Related pages
- For automatic test-selection behavior at the figure level, see Choose the Right Test.
- For normality and equal-variance checks that feed into automatic switching, see Normality and Homoscedasticity.
- For the broader claim-bearing assumption gate, see Assumption and Robustness Guard.
Current limitations
- Project policy editing is not yet exposed in the current UI. If no explicit stored policy exists, Licklider uses the built-in default policy.
- The v1 audit surface is narrower than the full policy schema. It does not yet expose dedicated audit checks for sidedness justification, exact p-values, same exclusion within dataset families, or same preprocessing within dataset families.
- The exception workflow is not implemented yet. Deliberate deviations are not currently stored as approved exceptions tied to a policy version.
- The multiple-testing burden is a project-wide approximate estimate based on figures with a primary test. The recorded
family_scopedoes not yet change the v1 calculation. - The audit scope currently covers all figure artifacts, including historical revisions, rather than only the latest figure version in each lineage.
- Dataset-family detection is based on direct
source_dataset_artifact_idmatching. Derived datasets or downstream subsets are not yet grouped into a richer lineage-aware family model. - If an explicit stored project policy is unreadable, the audit is unavailable until the policy is reconfigured.