Variable and ID Mapping

How Licklider infers column roles and dataset-level properties from your data, and how to review the results.

When a dataset is loaded, Licklider infers a set of column-linked and dataset-level fields in the Data Contract — for example, which columns look like groups or subject IDs, what kind of outcome the dataset contains, and whether the rows appear to be raw observations or summaries.

This page covers dataset-level mapping in the Data Contract — not just variable names or IDs, but the full set of structural properties that downstream quality checks and analysis depend on.


What gets inferred automatically

Immediately after a file is uploaded, Licklider analyzes the column profiles and records the following in the Data Contract:

FieldWhat it captures
Subject ID columnsColumns that identify the observation unit across rows
Group columnsColumns that define experimental groups or conditions
Control groupWhich group value represents the control or reference
Timepoint columnsColumns that record time or study stage
Batch ID columnsColumns that identify processing batch
Plate ID columnsColumns that identify assay plate
Run order columnsColumns that record run sequence
Outcome typeThe type of outcome variable (continuous, binary, count, proportion, survival)
Row grainWhether each row is a raw observation or a group summary
Hierarchical structureNesting relationships between observation levels
Preprocessing appliedWhether values appear to have been pre-processed
Limit of detectionWhether values appear to include detection-limit substitutions

Each field is assigned a confidence label indicating how strongly the current mapping is supported by the data profile. Lower-confidence fields should be reviewed before relying on the checks or suggestions that depend on them.


What you can confirm or change

Via the Observation Unit Declaration wizard The observation unit type and subject ID column can be confirmed or corrected directly. This is the most consequential mapping and is surfaced as an explicit step. See Observation Unit Declaration.

Via Chat Some analysis-relevant mappings can be clarified in Chat. For example, you can tell Licklider which column is the group column, or confirm that the outcome should be treated as binary rather than continuous. Licklider will update its working assumptions accordingly.

By re-uploading a cleaner file If the underlying column profile is wrong — for example, because non-numeric values are mixed into a numeric column, or because column names don't reflect their content — correcting the file and re-uploading will trigger a fresh inference from the improved structure.


How to read the Data Contract overview

The Data Contract overview is visible in the Inspector panel of the dataset view. It shows the key inferred fields alongside their current values, how they were established, and whether any fields need review.

How a mapping was established:

  • Inferred from data — determined automatically from the column profile
  • User-declared — confirmed or set by you

Fields marked as needing review should be checked before relying on quality checks that depend on them.


What you will see in practice

In day-to-day use, this mapping appears as a structured overview of the dataset, rather than as a single pass or fail result. You should expect to see:

  • The current value for each mapped field, such as the subject ID column, group column, outcome type, or row grain
  • How each field was established, including whether it was inferred from data or explicitly confirmed by you
  • A confidence label that signals how strongly the current mapping is supported by the column profile
  • A review state when the current mapping is plausible but not secure enough to rely on without checking
  • Conflict or review prompts when the mapped structure does not fit how the dataset is being analyzed

These outputs do not themselves produce a p-value, effect size, or figure. Instead, they determine which downstream checks, analysis suggestions, and figure-level assignments Licklider treats as appropriate for the dataset.


Key fields and what they affect

Group columns Used to identify which observations belong to which experimental condition. This feeds into group comparison analyses and batch confounding detection. If the wrong column is identified as the group column, comparisons may be structured incorrectly.

Subject ID columns Used for paired analyses and pseudoreplication detection. Set through the Observation Unit Declaration wizard when applicable, or inferred from column name patterns and uniqueness.

Timepoint columns Used to structure repeated measures analyses and to monitor sample attrition across time points. Temporal columns are detected from value patterns and column names.

Outcome type Determines which statistical tests and quality checks are appropriate. Continuous outcomes support t-tests and ANOVA. Binary outcomes suggest logistic regression. Survival outcomes trigger survival-specific checks.

Row grain Distinguishes between datasets where each row is a raw observation and datasets where each row is already a group summary (e.g., mean ± SD per group). This affects whether individual-level quality checks apply.


What happens when fields are inferred incorrectly

Quality checks and analysis that depend on incorrectly inferred fields may be based on the wrong assumptions. The most common consequences are:

  • Group comparisons structured around the wrong column
  • Pseudoreplication checks that miss actual replication because the subject ID column was not identified
  • Survival analyses not triggered because the event column was not recognized as binary

When a conflict is detected — for example, when the inferred row grain is inconsistent with how the analysis is structured — the Inspector will flag the field for review.

If you see unexpected behavior in quality checks or analysis suggestions, reviewing the Data Contract overview is a good first step.


What Licklider can and cannot determine automatically

Licklider can infer likely structural fields from column names, value patterns, uniqueness, and repeated-row structure. It can also detect some conflicts between the current mapping and the way a dataset is being used.

However, Licklider cannot determine structure with certainty when the relevant information is missing, ambiguous, or scientifically misdeclared. In particular, Licklider cannot reliably determine:

  • Whether a column that looks like an ID is the biologically correct unit for replication
  • Whether apparently independent rows are actually nested within a missing higher-level unit
  • Whether a group-like column reflects the intended experimental contrast rather than a batch, site, or label
  • Whether an outcome column that mixes encodings, summaries, and raw values should be interpreted as one outcome type

These limits matter because a structurally wrong mapping can propagate into later decisions. A missed subject ID can make pseudoreplication harder to detect, a wrong group column can misstate the comparison being tested, and an incorrect outcome type can surface unsuitable analysis suggestions.

That is why lower-confidence fields, review prompts, and user confirmation paths are part of the workflow rather than optional extras.


Relationship to figure-level column roles

The Data Contract mapping operates at the dataset level. When you request a specific figure or analysis, Licklider also assigns analysis-specific roles to columns for that figure — for example, which column is the outcome variable and which is the grouping variable for a particular t-test.

These figure-level assignments are separate from the Data Contract. They draw on the Data Contract as context but are determined per figure based on your request. For more detail on figure-level roles — see Required and Optional Columns.


Design rationale

This page follows a simple rule: dataset structure should be made explicit enough that downstream checks do not silently rest on hidden assumptions. That is why Licklider performs automatic mapping immediately after upload, shows confidence instead of pretending every inference is equally reliable, and separates dataset-level mapping from figure-level column roles.

The split between dataset-level and figure-level roles is intentional. Dataset-level mapping captures structural facts that should remain stable across many analyses, while figure-level roles are chosen in the context of a specific question. Keeping them separate helps prevent a single analysis request from silently rewriting the broader dataset assumptions.

The correction paths are also deliberate. Direct confirmation is used for high-consequence structure such as observation units, Chat can clarify ambiguous intent without requiring a full re-import, and re-uploading is the right path when the source file itself misrepresents the data.


What this page does not cover