Multi-omics and Compositional Data

High-dimensional biological data — gene expression, proteomics, metabolomics, microbiome abundance — shares a common structure: many features measured simultaneously, and results that need to be visualized and interpreted at scale.

Licklider is not a dedicated omics pipeline. It does not process raw sequencing reads, perform normalization for specific assay types, or run pathway enrichment. What it does is provide the downstream statistical visualization and integrity layer once the primary analysis has been done.

This workflow therefore produces downstream outputs: imported differential results shown as volcano plots, matrix-style views such as heatmaps, and warnings or disclosures when the imported table has structure that can make standard interpretation unsafe.

Volcano plots for differential analysis

After running differential expression or similar analysis in a dedicated tool, the results — fold change and p-value for each feature — can be imported into Licklider for visualization.

A volcano plot displays:

Fold change on the x-axis (typically log<sub>2</sub> FC)
−log<sub>10</sub> p-value on the y-axis
Color-coded points: red for significant features above the fold change threshold, orange for significant but small effect, gray for non-significant
Threshold lines at the significance cutoff and the fold change cutoff

Default thresholds are p < 0.05 and |FC| > 1.0. To use different thresholds: "Set the fold change threshold to 1.5" or "Use p < 0.01."

These defaults are intended as a readable starting view, not as universal omics cutoffs. They help separate large visible signals from the background in the first pass, while leaving room for assay-specific or study-specific thresholds when the imported result table was built under a different standard.

Licklider does not determine whether the imported fold changes and p-values came from the right normalization, multiplicity correction, batch model, or feature-filtering strategy for your assay. If the upstream differential analysis is flawed, the volcano plot can still look convincing while reflecting the wrong preprocessing or model.

For more detail → see Volcano Plot.

Heatmaps for expression patterns

To visualize patterns across many features and samples simultaneously, use a heatmap. Each row is a sample and each column is a feature.

The heatmap uses the Viridis color scale. Rows can be sorted by their mean value to make overall patterns visible.

That mean-based ordering is meant to give a quick global reading of high-versus-low rows without claiming that nearby rows share the same feature pattern. It is a readability choice, not a clustering result.

Note: The current heatmap does not perform hierarchical clustering. Rows are sorted by average value, not grouped by pattern similarity. If you need dendrograms, cluster the data externally and import the sorted result.

Licklider also does not determine automatically which subset of features is scientifically the most important to display. In very wide tables, the choice of which columns to show remains part of the analytic judgment.

For more detail → see Heatmap.

Compositional data

Microbiome abundance, cell type proportions, and similar data are compositional: the values in each sample sum to a constant, so increasing one component necessarily decreases the others.

Standard statistical methods assume independence between variables, which is violated in compositional data. When Licklider detects that your data may be compositional — based on value ranges, column names, or row-wise sums — it surfaces a warning.

This detection is intentionally heuristic. It is designed to catch common compositional patterns early, not to certify that every bounded multicolumn table has been classified correctly.

The warning asks you to confirm:

Whether the data is compositional, and to include a disclosure in the output
Or whether the data is not compositional and standard analysis is appropriate

Licklider cannot determine automatically whether a transformed table still has the compositional meaning of the original assay, whether row sums are constant only after preprocessing, or whether a bounded table is actually one true composition rather than several separate measurements that only happen to share a scale.

Those limits matter because compositional structure can make standard comparisons and regressions look more certain than they are. If the warning does not fire when it should, standard methods may still be applied to a table whose components are not independent in the usual sense.

For more detail → see Compositional Data Warning.

Working with large feature tables

Omics datasets often have many columns (features) and many rows (samples). A few practical notes:

CSV files up to 100 MB are supported via Large Dataset Mode
For heatmaps, select the columns you want to display rather than including all features
For volcano plots, you need fold change and p-value columns — these are typically computed in an external tool (DESeq2, limma, etc.) and imported as a results table

What Licklider does not do

Raw sequencing processing or alignment
Normalization specific to RNAseq, proteomics, or metabolomics assay types
Pathway enrichment or gene set analysis
Sample clustering beyond sort-by-mean heatmaps

It also does not verify that imported omics result tables were generated with the correct upstream normalization, batch correction, dispersion model, multiple-testing strategy, or feature-filtering policy for the assay. Those decisions remain the responsibility of the external pipeline and the researcher who interprets its output.

Design rationale and references

This workflow is intentionally downstream-first. Licklider focuses on making imported omics-style results easier to inspect, disclose, and report without pretending to replace assay-specific primary pipelines.

That is why the page separates three roles clearly: volcano plots for imported differential results, heatmaps for broad pattern review, and compositional warnings when table structure can make standard interpretation unsafe. The workflow is designed to surface risks and display choices, not to act as the source of truth for upstream omics preprocessing.

References

Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 44(2), 139-177.
Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15, 550. https://doi.org/10.1186/s13059-014-0550-8

These references support two key boundaries reflected in Licklider's workflow: compositional data needs its own statistical care, and imported differential results depend heavily on the upstream model and normalization choices that produced the fold changes and p-values in the first place.

What this page does not cover

Volcano plot details → see Volcano Plot
Heatmap details → see Heatmap
Compositional data warning → see Compositional Data Warning