ID, Batch, and Timepoint Columns

Many datasets include one or more of these column types: an identifier that tracks which observation unit a row belongs to, a batch or plate label that records when or how samples were processed, or a timepoint that records when a measurement was taken. When present, each of these plays a role beyond the outcome and grouping variables — and Licklider uses them to run checks that would otherwise be impossible. When these columns are identified correctly, Licklider can use them to support better analysis setup, quality checks, figures, and interpretation.

ID columns

An ID column identifies the observation unit — a subject, animal, patient, cell line, or sample — rather than measuring it. Licklider recognizes ID columns using column name patterns (such as id, subject_id, or names ending in _id) and by detecting columns with high uniqueness across rows.

Once identified, an ID column is excluded from automatic selection as an outcome variable or group variable. It will not appear as a candidate for value or group assignment unless you explicitly select it.

Why ID columns matter

ID columns are the foundation of two critical quality checks:

Pseudoreplication detection — when the same ID appears more than once within a group, Licklider flags this as a potential pseudoreplication risk. It will ask you to confirm whether the repeated observations represent biological replicates or technical replicates, because the appropriate analysis differs between the two.

Paired and repeated measures analyses — for paired t-tests and repeated measures models, Licklider uses the ID column to match observations across conditions or time points. Without a correctly identified ID column, pairing cannot be established.

If your dataset contains multiple levels of nesting — for example, animals within litters, or samples within patients — you can specify multiple ID columns. Licklider stores these as a hierarchy and uses them in the appropriate checks.

Licklider cannot always infer from column names and uniqueness alone which identifier reflects the scientifically relevant observation unit. If the wrong ID is used — or if nesting and repeated observations are left implicit — pseudoreplication checks, pairing, and repeated-measures guidance can all be based on the wrong structure.

Batch columns

A batch column records an experimental grouping that was not part of the study design but may have introduced systematic variation — the day an assay was run, the plate a sample was processed on, the operator who performed the experiment, or the reagent lot used.

Licklider detects batch columns automatically using column name patterns including batch, plate, lot, run, and run order. Detection runs in the background even if you have not explicitly designated a batch column.

What Licklider checks

Once a batch column is identified, Licklider evaluates whether batch membership is confounded with group membership:

Complete confounding: all observations from one group are in one batch, and all observations from another group are in another. The batch effect and the treatment effect cannot be separated.
Partial confounding: batch and group are correlated but not perfectly aligned. This may affect interpretation and may require explicit adjustment in the analysis plan.

A warning appears in the Inspector when either condition is detected. This check runs automatically — you do not need to run it manually.

Batch vs group

A batch column should reflect an unintended source of variation, not a factor you are studying. Balanced randomization of samples across batches reduces confounding risk, but batch may still matter for review and disclosure even when randomization was performed. If your experimental design includes a deliberate batch structure, confirm this in the Data Contract so that Licklider's evaluation reflects your intent.

Other column names that map to the same concept: plate, run, lane, sequence, experimenter. All are handled through the same confounding detection mechanism.

Licklider cannot determine from the column name alone whether a batch-like column is truly an unwanted source of variation, an intended design factor, or part of a blocking strategy. If that meaning is wrong, the resulting confounding warning may be misleading and the analysis plan may need a different interpretation.

Timepoint columns

A timepoint column records when or at what stage an observation was made — a calendar date, a study day, a passage number, or a labeled interval such as "Baseline", "Week 4", "Week 8".

Licklider detects timepoint columns through type inference on temporal-looking values. You can also designate a timepoint column manually in the Data Contract.

Timepoint columns do more than provide an x-axis

In a line chart, the timepoint column determines the x-axis. But in the quality check layer, timepoint columns serve two additional purposes:

Attrition monitoring — Licklider counts the number of observations at each time point and tracks whether that number changes. A drop in sample size between time points is recorded and disclosed. If the drop is differential across groups, it is flagged as a potential source of bias.

Repeated measures guidance — when the same ID appears at multiple time points, Licklider can surface a repeated-measures risk and suggest a paired or repeated-measures workflow rather than a naive group comparison that does not account for within-subject correlation.

Timepoint values

Timepoint values can be numeric, date strings, or categorical labels. Licklider preserves the order in which they appear in the data for categorical timepoints. If your timepoints are labeled (e.g., "Baseline", "Week 4") and the data order does not match the intended sequence, specify the correct order in the Data Contract.

Licklider cannot always determine from values alone whether a temporal- looking column represents follow-up time, ordered stages, passage number, or something else. If the scientific meaning or intended order is wrong, attrition tracking, repeated-measures guidance, figure ordering, and downstream interpretation can all be affected.

What happens when these columns change

ID, batch, and timepoint designations are stored in the Data Contract. If you change them after analysis has begun, Licklider detects the conflict and marks related checks for re-evaluation. Analyses that depended on the previous designation are flagged for review and related quality checks are re-evaluated.

What this page does not cover

Declaring the observation unit and what each row represents — see Observation Unit Declaration
Mapping variables to analysis roles — see Variable and ID Mapping
How pseudoreplication is detected and reported — see Pseudoreplication Detection
How batch confounding is evaluated — see Batch and Plate Confounding Detection
How attrition across time points is tracked — see N Disclosure and Attrition Trail

Design Rationale & References

This page follows a simple rule: identifiers, batch sources, and time-related structure should be made explicit because they directly change how results are checked and interpreted. That is why Licklider tries to detect these columns automatically, asks for clarification when needed, and treats them as part of the analytical structure rather than as ordinary labels.

Lazic, S. E. (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience, 11, 5. https://doi.org/10.1186/1471-2202-11-5
Vaux, D. L., Fidler, F., & Cumming, G. (2012). Replicates and repeats - what is the difference and is it significant? EMBO Reports, 13(4), 291-296. https://doi.org/10.1038/embor.2012.36
Leek, J. T., Scharpf, R. B., Bravo, H. C., et al. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 733-739. https://doi.org/10.1038/nrg2825