Observation Unit Declaration

How to declare what each row in your dataset represents, and why this matters for how Licklider interprets your data.

For a new dataset without a confirmed declaration, one of the first things Licklider asks is: what does each row represent?

This is the observation unit declaration. It tells Licklider the biological or experimental entity that each row in your data corresponds to — an animal, a patient, a cell, a well, or something else. The answer shapes how Licklider interprets your data structure, which quality checks it applies, and how it handles replication and independence.


Why it matters

The observation unit is not just a label. It determines:

Which columns are treated as identifiers If each row represents an animal, the column that identifies which animal a row belongs to becomes the subject ID. This is used for pairing in repeated measures analyses and for detecting pseudoreplication.

How independence is assessed If multiple rows share the same observation unit — for example, three technical measurements from the same cell — Licklider can flag this as a potential pseudoreplication risk and ask how the replication is structured.

What quality checks are applied Once the observation unit is declared, Licklider can evaluate whether your analysis treats repeated observations from the same unit as independent, and surface a warning if they are not.


How to declare the observation unit

The declaration wizard appears automatically when a dataset is first loaded, if no declaration has been made yet. It can also be accessed from the Data Contract panel.

Step 1: Select the observation unit type

OptionTypical use
AnimalIn vivo experiments; each row is one animal
DonorHuman tissue or cell donation studies
PatientClinical studies; each row is one patient
CellCell-level measurements; each row is one cell
Tissue SliceHistology or slice physiology experiments
WellPlate-based assays; each row is one well
SampleGeneric biological sample
MeasurementEach row is a single readout; the higher-level unit is declared separately if needed
OtherCustom label for units not listed above

Step 2: Specify hierarchy (if applicable)

For observation units that are nested within a larger unit — for example, cells within an animal, or wells within a donor — a second step asks you to specify the parent unit and the column that identifies it.

This applies to: Cell, Tissue Slice, Well, and Measurement. For these unit types, the parent is treated as the higher-level biological unit, and its ID is used as a primary input to independence-related checks.

For all other unit types, the selected unit is treated as the leaf unit, and its ID column is used directly.


What gets recorded

The declaration is saved to the Data Contract as a user-declared setting. It is not overwritten by automatic inference once you have confirmed it.

The following information is recorded:

  • The observation unit type (e.g., Animal)
  • The subject ID column (if applicable)
  • The parent unit and its ID column (if a nested structure was specified)

This information is visible in the Inspector under the Data Contract section.


What you will see after declaration

Once the observation unit is declared or confirmed, Licklider uses it as a structural input to downstream checks. In practice, you should expect to see:

  • The declared observation unit and its ID column in the Inspector under Data Contract
  • A parent unit and parent ID, when you declare a nested structure such as cells within an animal or wells within a donor
  • Independence-related checks evaluated against that declared structure
  • Warnings or confirmation requests when repeated rows suggest that the same biological unit may have been counted more than once
  • Related checks marked for re-evaluation if you later change the declaration

This declaration does not itself produce a p-value or effect size. Instead, it determines whether later analyses and quality checks are interpreted against the correct unit of replication.


What happens if you skip the declaration

If no declaration has been made, Licklider attempts an initial inference from the data profile — for example, from identifier-like columns or repeated-row patterns — and marks the result as needing confirmation when the structure is not fully resolved.

In this state, some quality checks that depend on the observation unit — particularly independence checks and pseudoreplication detection — operate with reduced certainty. The Inspector will indicate that confirmation is needed.

That means Licklider may be able to surface a plausible structure, but it cannot treat that structure as fully confirmed until you verify what each row represents.

You can complete the declaration at any point. Once confirmed, the quality checks are re-evaluated using the confirmed declaration.


Changing the declaration

If you need to change the observation unit after the initial declaration, you can do so from the Data Contract panel. The new declaration replaces the previous one.

When the observation unit changes, quality checks that depend on it are flagged for re-evaluation. Results that were generated under the previous declaration should be reviewed before use.


Biological vs technical replication

The observation unit declaration tells Licklider what each row represents, but it does not fully resolve the question of biological versus technical replication. That distinction is made in a separate step when Licklider detects that the same subject ID appears multiple times within a group.

At that point, Licklider asks whether the repeated observations are biological replicates (independent experimental units) or technical replicates (repeated measurements of the same unit). The appropriate analysis differs between the two.

This distinction matters because the number of independent biological units, not the number of rows alone, determines the effective replication for many statistical comparisons.


What Licklider can and cannot determine automatically

Licklider can infer candidate observation-unit structure from patterns in the dataset, such as identifier-like columns, repeated rows, and nested-looking IDs. It can also flag cases where the declared structure appears to conflict with the row pattern.

However, Licklider cannot recover study design information that is not encoded in the data or that has been declared incorrectly. In particular, Licklider cannot determine with certainty:

  • Whether a reused ID truly represents the same biological unit or a naming collision across batches or files
  • Whether rows that look independent are actually nested within a higher-level unit that is missing from the dataset
  • Whether a technically repeated measurement should be averaged, modeled explicitly, or excluded without knowing the study design and analysis intent
  • Whether a user-confirmed declaration is scientifically correct if the wrong subject ID or parent ID was chosen

These limits matter because a wrong or incomplete declaration can make an analysis appear better supported than it really is. For example, if multiple cells from the same animal are analyzed as if they came from different animals, the apparent sample size can be overstated and independence-related checks may not reflect the true biological unit.

For more detail, see Pseudoreplication Detection.


What this page does not cover


Design Rationale & References

This page follows a simple rule: the unit of observation should be declared explicitly because independence is a property of the experimental design, not just of the row count. That is why Licklider asks what each row represents, uses parent units for nested structures, preserves the confirmed declaration instead of silently overwriting it, and re-evaluates related checks when the declaration changes.

These design choices are intended to reduce a common failure mode in experimental data analysis: confusing repeated measurements or nested observations with independent biological replication.

  1. Hurlbert, S. H. (1984). Pseudoreplication and the Design of Ecological Field Experiments. Ecological Monographs, 54(2), 187-211. https://doi.org/10.2307/1942661
  2. Lazic, S. E. (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience, 11, 5. https://doi.org/10.1186/1471-2202-11-5
  3. Vaux, D. L., Fidler, F., & Cumming, G. (2012). Replicates and repeats - what is the difference and is it significant? EMBO Reports, 13(4), 291-296. https://doi.org/10.1038/embor.2012.36