Required and Optional Columns

Licklider reads your dataset as a table where each column has a type and a role. Understanding how types and roles are assigned — and which ones are required for a given analysis — helps you bring data that works without friction. When the needed columns are present and assigned correctly, Licklider can move on to analysis setup, quality checks, figures, and results that match the question you are trying to answer.

Column types

When you upload a file, Licklider makes an initial guess at the type of each column by examining its values. When the type is ambiguous, it asks you to confirm.

Type	Description	Examples
`numeric`	Continuous or discrete numbers	1.2, 42, 0.005
`categorical`	Text or low-cardinality labels	"Control", "Treatment A", "Male"
`temporal`	Dates or times	2024-01-01, Day1, Week4
`id`	High-cardinality identifiers with no analytical meaning	subject IDs, sample codes
`boolean`	True/false or binary indicators	true/false, 0/1, yes/no

For ambiguous cases — for example, a column that could be either numeric or temporal — Licklider uses column name patterns (such as "date" or "time") and the proportion of successfully parseable values to decide. When no type fits clearly, the column is classified as categorical.

You do not need to pre-format columns to specific types before importing.

Column roles

Type describes what the values in a column look like. Role describes what a column does in an analysis.

The same column can take different roles in different analyses. A numeric column might be the outcome variable (value) in one analysis and the x-axis variable (x) in another. A categorical column might be the grouping variable (group) in one context and an unreferenced attribute in another. Type stays fixed; role is assigned per analysis.

Dataset-level roles

When a dataset is first set up, Licklider assigns each column one of three broad roles:

Role	Meaning
`measure`	A numeric variable available as an outcome or predictor
`identifier`	A column that identifies an observation unit — a subject, sample, or animal — rather than measuring it
`attribute`	A grouping, condition, batch, or other descriptive variable

Analysis-specific roles

When a specific figure or test is requested, columns are mapped to more precise roles. Common analysis roles include:

Role	Meaning
`value`	The outcome variable for this analysis
`group`	The grouping variable (e.g., Treatment vs Control)
`time`	A time or sequence column
`x` / `y`	Axes for scatter plots and regression
`event`	A binary outcome indicator used in survival analysis
`pair_column`	A subject or unit ID used to match paired observations

Roles are inferred from dataset-level assignments and column types. When inference is insufficient, Licklider asks you to specify the role explicitly through the Chat setup flow.

Required vs optional columns

There are no columns that are required in every dataset. What is required depends entirely on the analysis or figure you are running.

For most group comparison analyses (t-test, ANOVA, non-parametric alternatives), you need at minimum:

A numeric outcome column (value)
A column that identifies which group each observation belongs to (group)

For analyses where the structure is more specific:

Analysis or figure	Required columns
Paired t-test / repeated measures	A subject or unit ID used for matching (pair ID)
Line chart	A time or sequence column
Scatter plot / regression	An x-axis column and a y-axis column
Survival analysis	A time-to-event column and a binary event indicator
Chi-square test	Two categorical columns

Optional for most analyses:

A batch or plate column (used for confounding detection)
Additional covariates
A label column for annotating individual points in figures

If a column required for the requested analysis is missing, Licklider returns an error identifying which column is absent and does not proceed.

How Licklider asks about columns

When the required column assignments cannot be determined from the data and dataset setup alone, Licklider asks. The setup flow presents the available columns as options and asks you to confirm or correct the assignment.

For example:

"Which column contains your outcome variable?"
"Which column identifies the group each observation belongs to?"
"Which column identifies the subject across time points?"

Licklider cannot always determine from column names and values alone which scientific role a column should play. Observation unit, pairing, replication structure, and causal meaning may still require your input. If those assignments are wrong, Licklider may carry the wrong mapping into analysis setup, quality checks, figures, and interpretation.

You do not need to rename columns before importing. Licklider resolves assignments through this flow regardless of what your columns are named.

Column naming

Licklider normalizes column names on import:

Unicode is normalized (NFKC)
Invisible characters such as zero-width spaces are removed
Leading and trailing whitespace is trimmed

If two columns produce the same name after normalization, both are retained but the collision is recorded as a warning. This may cause unexpected behavior in analyses that reference those columns by name. Rename the columns before importing to avoid this.

Column names are not otherwise restricted. Spaces, special characters, and non-English characters are all accepted.

What this page does not cover

Handling ID, batch, and timepoint columns specifically — see ID, Batch, and Timepoint Columns
Declaring what each row in your dataset represents — see Observation Unit Declaration
Mapping variables to analysis roles in the Data Contract — see Variable and ID Mapping
Common errors that arise from column structure — see Common Import Errors

Design Rationale & References

This page follows a simple rule: columns that matter to the analysis should be represented and confirmed explicitly, rather than assumed from names alone. That is why Licklider separates column type from column role, asks for confirmation when inference is insufficient, and treats observation-level identifiers as analytically important structure rather than as ordinary labels.

Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
Lazic, S. E. (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience, 11, 5. https://doi.org/10.1186/1471-2202-11-5