Required and Optional Columns
How Licklider interprets columns in your dataset, which columns are required for a given analysis, and how to handle column naming issues.
Licklider reads your dataset as a table where each column has a type and a role. Understanding how types and roles are assigned — and which ones are required for a given analysis — helps you bring data that works without friction. When the needed columns are present and assigned correctly, Licklider can move on to analysis setup, quality checks, figures, and results that match the question you are trying to answer.
Column types
When you upload a file, Licklider makes an initial guess at the type of each column by examining its values. When the type is ambiguous, it asks you to confirm.
| Type | Description | Examples |
|---|---|---|
numeric | Continuous or discrete numbers | 1.2, 42, 0.005 |
categorical | Text or low-cardinality labels | "Control", "Treatment A", "Male" |
temporal | Dates or times | 2024-01-01, Day1, Week4 |
id | High-cardinality identifiers with no analytical meaning | subject IDs, sample codes |
boolean | True/false or binary indicators | true/false, 0/1, yes/no |
For ambiguous cases — for example, a column that could be either numeric or temporal — Licklider uses column name patterns (such as "date" or "time") and the proportion of successfully parseable values to decide. When no type fits clearly, the column is classified as categorical.
You do not need to pre-format columns to specific types before importing.
Column roles
Type describes what the values in a column look like. Role describes what a column does in an analysis.
The same column can take different roles in different analyses. A numeric column might be the outcome variable (value) in one analysis and the x-axis variable (x) in another. A categorical column might be the grouping variable (group) in one context and an unreferenced attribute in another. Type stays fixed; role is assigned per analysis.
Dataset-level roles
When a dataset is first set up, Licklider assigns each column one of three broad roles:
| Role | Meaning |
|---|---|
measure | A numeric variable available as an outcome or predictor |
identifier | A column that identifies an observation unit — a subject, sample, or animal — rather than measuring it |
attribute | A grouping, condition, batch, or other descriptive variable |
Analysis-specific roles
When a specific figure or test is requested, columns are mapped to more precise roles. Common analysis roles include:
| Role | Meaning |
|---|---|
value | The outcome variable for this analysis |
group | The grouping variable (e.g., Treatment vs Control) |
time | A time or sequence column |
x / y | Axes for scatter plots and regression |
event | A binary outcome indicator used in survival analysis |
pair_column | A subject or unit ID used to match paired observations |
Roles are inferred from dataset-level assignments and column types. When inference is insufficient, Licklider asks you to specify the role explicitly through the Chat setup flow.
Required vs optional columns
There are no columns that are required in every dataset. What is required depends entirely on the analysis or figure you are running.
For most group comparison analyses (t-test, ANOVA, non-parametric alternatives), you need at minimum:
- A numeric outcome column (
value) - A column that identifies which group each observation belongs to (
group)
For analyses where the structure is more specific:
| Analysis or figure | Required columns |
|---|---|
| Paired t-test / repeated measures | A subject or unit ID used for matching (pair ID) |
| Line chart | A time or sequence column |
| Scatter plot / regression | An x-axis column and a y-axis column |
| Survival analysis | A time-to-event column and a binary event indicator |
| Chi-square test | Two categorical columns |
Optional for most analyses:
- A batch or plate column (used for confounding detection)
- Additional covariates
- A label column for annotating individual points in figures
If a column required for the requested analysis is missing, Licklider returns an error identifying which column is absent and does not proceed.
How Licklider asks about columns
When the required column assignments cannot be determined from the data and dataset setup alone, Licklider asks. The setup flow presents the available columns as options and asks you to confirm or correct the assignment.
For example:
- "Which column contains your outcome variable?"
- "Which column identifies the group each observation belongs to?"
- "Which column identifies the subject across time points?"
Licklider cannot always determine from column names and values alone which scientific role a column should play. Observation unit, pairing, replication structure, and causal meaning may still require your input. If those assignments are wrong, Licklider may carry the wrong mapping into analysis setup, quality checks, figures, and interpretation.
You do not need to rename columns before importing. Licklider resolves assignments through this flow regardless of what your columns are named.
Column naming
Licklider normalizes column names on import:
- Unicode is normalized (NFKC)
- Invisible characters such as zero-width spaces are removed
- Leading and trailing whitespace is trimmed
If two columns produce the same name after normalization, both are retained but the collision is recorded as a warning. This may cause unexpected behavior in analyses that reference those columns by name. Rename the columns before importing to avoid this.
Column names are not otherwise restricted. Spaces, special characters, and non-English characters are all accepted.
What this page does not cover
- Handling ID, batch, and timepoint columns specifically — see ID, Batch, and Timepoint Columns
- Declaring what each row in your dataset represents — see Observation Unit Declaration
- Mapping variables to analysis roles in the Data Contract — see Variable and ID Mapping
- Common errors that arise from column structure — see Common Import Errors
Design Rationale & References
This page follows a simple rule: columns that matter to the analysis should be represented and confirmed explicitly, rather than assumed from names alone. That is why Licklider separates column type from column role, asks for confirmation when inference is insufficient, and treats observation-level identifiers as analytically important structure rather than as ordinary labels.
- Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1-23. https://doi.org/10.18637/jss.v059.i10
- Lazic, S. E. (2010). The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience, 11, 5. https://doi.org/10.1186/1471-2202-11-5