Required and Optional Columns

How Licklider interprets columns in your dataset, which columns are required for a given analysis, and how to handle column naming issues.

Licklider reads your dataset as a table where each column has a type and a role. Understanding how types and roles are assigned — and which ones are required for a given analysis — helps you bring data that works without friction. When the needed columns are present and assigned correctly, Licklider can move on to analysis setup, quality checks, figures, and results that match the question you are trying to answer.


Column types

When you upload a file, Licklider makes an initial guess at the type of each column by examining its values. When the type is ambiguous, it asks you to confirm.

TypeDescriptionExamples
numericContinuous or discrete numbers1.2, 42, 0.005
categoricalText or low-cardinality labels"Control", "Treatment A", "Male"
temporalDates or times2024-01-01, Day1, Week4
idHigh-cardinality identifiers with no analytical meaningsubject IDs, sample codes
booleanTrue/false or binary indicatorstrue/false, 0/1, yes/no

For ambiguous cases — for example, a column that could be either numeric or temporal — Licklider uses column name patterns (such as "date" or "time") and the proportion of successfully parseable values to decide. When no type fits clearly, the column is classified as categorical.

You do not need to pre-format columns to specific types before importing.


Column roles

Type describes what the values in a column look like. Role describes what a column does in an analysis.

The same column can take different roles in different analyses. A numeric column might be the outcome variable (value) in one analysis and the x-axis variable (x) in another. A categorical column might be the grouping variable (group) in one context and an unreferenced attribute in another. Type stays fixed; role is assigned per analysis.

Dataset-level roles

When a dataset is first set up, Licklider assigns each column one of three broad roles:

RoleMeaning
measureA numeric variable available as an outcome or predictor
identifierA column that identifies an observation unit — a subject, sample, or animal — rather than measuring it
attributeA grouping, condition, batch, or other descriptive variable

Analysis-specific roles

When a specific figure or test is requested, columns are mapped to more precise roles. Common analysis roles include:

RoleMeaning
valueThe outcome variable for this analysis
groupThe grouping variable (e.g., Treatment vs Control)
timeA time or sequence column
x / yAxes for scatter plots and regression
eventA binary outcome indicator used in survival analysis
pair_columnA subject or unit ID used to match paired observations

Roles are inferred from dataset-level assignments and column types. When inference is insufficient, Licklider asks you to specify the role explicitly through the Chat setup flow.


Required vs optional columns

There are no columns that are required in every dataset. What is required depends entirely on the analysis or figure you are running.

For most group comparison analyses (t-test, ANOVA, non-parametric alternatives), you need at minimum:

  • A numeric outcome column (value)
  • A column that identifies which group each observation belongs to (group)

For analyses where the structure is more specific:

Analysis or figureRequired columns
Paired t-test / repeated measuresA subject or unit ID used for matching (pair ID)
Line chartA time or sequence column
Scatter plot / regressionAn x-axis column and a y-axis column
Survival analysisA time-to-event column and a binary event indicator
Chi-square testTwo categorical columns

Optional for most analyses:

  • A batch or plate column (used for confounding detection)
  • Additional covariates
  • A label column for annotating individual points in figures

If a column required for the requested analysis is missing, Licklider returns an error identifying which column is absent and does not proceed.


How Licklider asks about columns

When the required column assignments cannot be determined from the data and dataset setup alone, Licklider asks. The setup flow presents the available columns as options and asks you to confirm or correct the assignment.

For example:

  • "Which column contains your outcome variable?"
  • "Which column identifies the group each observation belongs to?"
  • "Which column identifies the subject across time points?"

Licklider cannot always determine from column names and values alone which scientific role a column should play. Observation unit, pairing, replication structure, and causal meaning may still require your input. If those assignments are wrong, Licklider may carry the wrong mapping into analysis setup, quality checks, figures, and interpretation.

You do not need to rename columns before importing. Licklider resolves assignments through this flow regardless of what your columns are named.


Column naming

Licklider normalizes column names on import:

  • Unicode is normalized (NFKC)
  • Invisible characters such as zero-width spaces are removed
  • Leading and trailing whitespace is trimmed

If two columns produce the same name after normalization, both are retained but the collision is recorded as a warning. This may cause unexpected behavior in analyses that reference those columns by name. Rename the columns before importing to avoid this.

Column names are not otherwise restricted. Spaces, special characters, and non-English characters are all accepted.


What this page does not cover


Design Rationale & References

This page follows a simple rule: columns that matter to the analysis should be represented and confirmed explicitly, rather than assumed from names alone. That is why Licklider separates column type from column role, asks for confirmation when inference is insufficient, and treats observation-level identifiers as analytically important structure rather than as ordinary labels.

  1. Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
  2. Lazic, S. E. (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience, 11, 5. https://doi.org/10.1186/1471-2202-11-5