Versioning and Provenance
How Licklider tracks dataset versions, records where each figure came from, and packages everything needed to reproduce a result.
Reproducibility in research requires more than showing a result. It requires being able to answer: which data was used, which preprocessing steps were applied, which version of the analysis code ran, and what the statistical parameters were. Licklider records all of this automatically, for every figure.
Dataset versioning
Every change to a dataset creates a new revision. Revisions are numbered sequentially and are created when:
- A file is uploaded or re-uploaded
- A preprocessing action is applied
- The Data Contract is updated
Each revision is a snapshot of the dataset's state at that point. When a figure is generated, it is linked to the specific revision it was built from. If the dataset changes after a figure is generated, Licklider detects the mismatch and flags the figure for review.
The current revision and a summary of recent changes are visible in the Data Contract overview in the Inspector.
Figure provenance
Every figure Licklider generates carries a provenance record. This records:
- The version of the Licklider application that ran
- The version of the statistical engine
- The version of the ruleset used for quality checks
- The date and time the figure was generated
- The preprocessing steps applied to the dataset, including any exclusions or transformations
- The statistical parameters used in the calculation
This information is visible in the Assurance panel of the figure Inspector under the Provenance section.
Patch history When a figure is edited after generation, for example, changing axis labels or adjusting a style setting, each change is recorded with a provenance tag indicating whether it was made by you, by Licklider, or automatically. The full edit history is visible in the History panel.
Policy evidence
Each figure also records policy evidence describing the project statistical policy that was active when the figure was generated. This evidence is stored with the figure's metadata and gives the Project Audit a stable basis for comparing old figure state against the project's current effective policy.
If the project policy changes later, the old figure is not silently rewritten. Instead, the Project Audit can detect that the stored policy snapshot no longer matches the live policy and suggest re-review. For details, see Project Statistical Policy and Consistency Audit.
Data Contract and figure consistency
The Data Contract records how the dataset is structured which columns identify groups, what the outcome type is, what the observation unit is, and so on. When a figure is generated, the current state of the Data Contract is frozen alongside it.
This freeze is intentional. It preserves the exact structural assumptions under which the figure was created, so that a later correction to the dataset definition does not silently rewrite the meaning of an earlier result.
If the Data Contract is later changed, for example, if the outcome type is corrected or the subject ID column is reassigned, Licklider detects that the frozen state no longer matches the current state. Figures that were generated under the previous Data Contract are flagged for re-evaluation.
This means you do not need to manually track which figures were built under which assumptions. Licklider does this for you.
Reproducibility bundle
For every figure, Licklider automatically generates a reproducibility bundle. This is a self-contained package containing everything needed to understand and verify the result.
Contents of the bundle:
| File | Contents |
|---|---|
fig_ir.json | The canonical figure specification used to generate the output |
stats_meta.json | Statistical parameters, test results, and effect sizes |
version_snapshot.json | Licklider version, stats engine version, ruleset version |
plotly.json | The rendered figure data |
preprocess_report.json | Preprocessing steps applied, including exclusions |
raw.csv | The original data before preprocessing |
processed.csv | The data after preprocessing |
manifest.json | A signed manifest with checksums for all files |
repro.py | A Python script to re-render the figure from the bundle |
repro.R | An R script to re-render the figure from the bundle |
README.md | Instructions for using the bundle |
The bundle can be downloaded from the figure Inspector.
The bundle includes both raw.csv and processed.csv because reproducibility often depends on showing not only the final values used for the figure, but also the exact preprocessing path that transformed the original dataset into those values.
The signed manifest.json exists so that each file in the bundle can be checked against its recorded checksum. This helps detect accidental modification, incomplete transfer, or mismatch between files before the bundle is reused.
For the downloadable ZIP that combines the canonical bundle with browser-captured PNG and SVG exports, see Reproducibility Package.
What the bundle enables:
- A reviewer or collaborator can verify the exact data and parameters that produced the figure
- The
repro.pyorrepro.Rscript can be run to re-render the figure independently of Licklider - The bundle can be imported back into Licklider, which verifies the manifest checksums before loading
Limitations: The re-render scripts reproduce the figure output from the stored parameters. They do not re-run the full server-side analysis pipeline from scratch. Full pipeline re-execution requires re-running the analysis in Licklider itself.
Design Rationale
This page follows a simple rule: a result should remain traceable to the exact data state, structural assumptions, and rendering inputs that produced it. That is why Licklider versions datasets revision by revision, freezes the Data Contract at figure generation time, and packages the rendered result together with its provenance metadata.
The bundle is designed to answer two different questions at once: "What produced this figure?" and "Can I verify or re-render it outside the original session?" That is why it contains both human-readable guidance (README.md) and machine-checkable assets such as checksums, version snapshots, and rendering files.
Licklider does not claim that the downloadable scripts recreate the entire server-side pipeline from first principles. Instead, the design goal is narrower and more practical: preserve enough information to audit the result, re-render the stored output, and detect when later dataset or contract changes mean a figure should be reviewed.
What is not tracked
- Changes made to data outside Licklider before upload are not tracked. The provenance record begins at the point of upload.
- Figures exported as PNG or SVG images do not contain embedded provenance metadata. Provenance is held in the reproducibility bundle.
- A comparison view showing side-by-side differences between dataset revisions is not currently available. Revision summaries show counts of changed rows and cells rather than a full diff.
What this page does not cover
- How preprocessing steps are recorded — see Preprocessing Audit Log
- How outlier exclusions are recorded — see Outlier Exclusion Log
- How sample size changes across time points are tracked — see N Disclosure and Attrition Trail