Project Statistical Policy and Consistency Audit

How Licklider evaluates a project's effective statistical policy, compares stored figure evidence against that live policy, and summarizes cross-figure consistency risks.

Licklider can review a project's statistical consistency at the project level. It does this by evaluating an effective project policy, comparing stored figure evidence against that live policy, and summarizing cross-figure consistency issues in the Project Audit panel.

This page explains what the policy means, what the current audit evaluates, how policy evidence is stored with figures, and where the current support boundary sits.


What is a project statistical policy?

A project statistical policy is a structured ruleset that defines how a project is expected to handle core statistical decisions. It includes defaults for alpha, sidedness, assumption handling, declared outlier method, multiple- comparison settings, and reporting expectations.

Every audit is evaluated against an effective policy:

  • If the project has an explicit stored policy and it can be read successfully, that policy is used.
  • If the project has no explicit stored policy, Licklider falls back to the built-in default policy.

The policy model is broader than the current audit surface. The current audit focuses on the parts of the policy that are already captured reliably in figure evidence and figure-level metadata.


Default policy

When no explicit project policy is stored, Licklider uses the following default values:

SettingDefault
Schema version1
Significance level (alpha.default)0.05
Alpha exceptions allowedtrue
Default sidednesstwo_sided
One-sided requires justificationtrue
Normality checkalways
Variance inequality actionwelch
Declared outlier methodiqr_1.5
Same outlier rule within dataset familytrue
Multiple-comparison correction methodholm
Multiple-comparison family scopeper_figure
Effect size requiredtrue
Confidence interval requiredtrue
Exact p-values requiredtrue
Same exclusion within dataset familytrue
Same preprocessing within dataset familytrue

These defaults are intentionally conservative. If your project has not stored a custom policy, the audit still runs against a known baseline instead of leaving the project's statistical expectations undefined.


How the Project Audit works

The Project Audit is derived on demand. Licklider does not persist a saved audit report and then reuse it later. Each time the Project Audit panel is opened or refreshed, Licklider rebuilds the report against the current effective policy and the currently collected figure evidence.

The model has three layers:

Evidence

When a figure is generated, Licklider stores policy evidence with that figure's metadata. This captures the policy state that was in effect when the figure was created and preserves the basis on which that figure was originally produced.

Derived

When the Project Audit runs, Licklider compares stored figure evidence and figure-level reporting signals against the current effective policy. This derived pass produces:

  • a summary
  • cross-figure consistency checks
  • per-figure compliance results
  • an approximate multiple-testing burden estimate
  • disclosure suggestions
  • follow-up suggestions

These derived results are recalculated each time the audit is requested.

Exception

An exception workflow is planned but not yet implemented. The intended future design is that a deliberate policy deviation can be acknowledged with a reason and tied to the policy version under which it was approved.

If an explicit stored project policy exists but is unreadable, the audit does not silently continue with a different policy. Instead, the audit is blocked until the project policy is reconfigured.


What the current audit evaluates

The current Project Audit panel is organized into several sections.

Summary

The summary reports high-level counts such as:

  • total figures
  • total figures with a primary statistical test
  • compliance rate
  • critical deviations
  • warnings
  • approximate estimated FWER

It also surfaces evidence-state counts, including figures without policy evidence, figures with malformed evidence, figures with unusable evidence, and figures with stale policy snapshots.

Consistency checks

The current audit runs four cross-figure checks:

Alpha Consistency
Checks whether comparable figures expose an alpha value and, when comparable alpha values exist, whether those values match policy.alpha.default. If alpha exceptions are allowed, mismatches are surfaced as warnings rather than hard violations.

Test Selection Consistency
Checks whether figures with the same comparable grouping signals use the same primary test choice. The current v1 grouping basis compares group count, pairing state, normality outcome, and variance outcome before asking whether the selected test names diverge.

Reporting Completeness
Checks whether figures with a primary test include effect size and confidence interval reporting when the live policy requires them.

Declared Outlier Method Consistency
Checks whether figures in the same dataset family declare the same outlier method when the policy requires a shared rule within that family.

Figure compliance

Each figure is also checked individually against the current effective policy. The current v1 figure-compliance surface checks four fields:

  • alpha
  • declared outlier method
  • effect size presence
  • confidence interval presence

If a field cannot be evaluated because the necessary signal is missing, it is counted as unchecked rather than being treated as a violation.

Multiple Testing Burden

The multiple-testing burden section is informational. It does not rewrite p-values or automatically apply a correction.

The current v1 estimate is a project-wide approximate FWER based on the count of figures that expose a primary test. The estimate uses the effective policy's default alpha:

1 - (1 - alpha)^n

where n is the number of figures with a primary test.

The policy's family_scope is recorded in the report, but it does not yet change the v1 burden calculation.

Required Disclosures

The audit can generate disclosure text that should be considered for the Methods or reporting sections of a paper. These disclosures are derived from the current report state and can be copied from the Project Audit panel.

Suggestions

The audit also provides follow-up suggestions, especially for stale policy snapshots, missing policy evidence, malformed evidence, and other situations where the project should be reviewed even if the issue is not represented as a single hard violation.


Policy evidence and stale snapshots

Policy evidence matters because projects evolve over time. When a figure is created, Licklider stores the policy basis that was active at generation time. If the effective project policy changes later, that does not silently rewrite the old figure.

Instead, the audit compares the current policy hash with the stored policy hash in the figure's evidence. If they differ, the figure is counted as having a stale policy snapshot.

This does not automatically mean the figure is wrong. It means the figure was created under a different policy state than the one the project currently advertises.

For the broader provenance model around dataset revisions, version snapshots, and reproducibility bundles, see Versioning and Provenance and Reproducibility Package.


Dataset families

Some project-level consistency checks work at the level of a dataset family rather than at the level of the whole project.

In the current implementation, a dataset family is built from the figure's source_dataset_artifact_id. In practical terms, figures that point back to the same source dataset artifact are treated as belonging to the same family.

This matters most for declared outlier-method consistency, because the current policy can require the same rule within a dataset family while allowing different families to use different declared methods.



Current limitations

  • Project policy editing is not yet exposed in the current UI. If no explicit stored policy exists, Licklider uses the built-in default policy.
  • The v1 audit surface is narrower than the full policy schema. It does not yet expose dedicated audit checks for sidedness justification, exact p-values, same exclusion within dataset families, or same preprocessing within dataset families.
  • The exception workflow is not implemented yet. Deliberate deviations are not currently stored as approved exceptions tied to a policy version.
  • The multiple-testing burden is a project-wide approximate estimate based on figures with a primary test. The recorded family_scope does not yet change the v1 calculation.
  • The audit scope currently covers all figure artifacts, including historical revisions, rather than only the latest figure version in each lineage.
  • Dataset-family detection is based on direct source_dataset_artifact_id matching. Derived datasets or downstream subsets are not yet grouped into a richer lineage-aware family model.
  • If an explicit stored project policy is unreadable, the audit is unavailable until the policy is reconfigured.