K-means Clustering Plot

Figure purpose

K-means clustering plot shows one current clustering result on a two-axis scatter-style view. Each point is an observation, colored by its assigned cluster in the current k-means solution.

It is useful when you want to inspect how one chosen partition looks in the plotted space and whether the assigned groups appear compact, overlapping, or difficult to separate on the currently displayed axes.

When to use or avoid

Use this page when cluster assignment itself is part of the exploratory question. It can be helpful for seeing whether one chosen k-means solution produces compact-looking groups or obvious overlaps in the displayed view.

Avoid treating the figure as proof that the data contain true clusters. K choice, initialization, scaling, and the axes displayed all influence what the chart looks like.

This figure should be read as an exploratory overlay, not as an automatic discovery guarantee. A visually separated result in two dimensions can still be unstable, and a visually overlapping result does not prove that no useful structure exists in the full feature set.

What the figure shows

The current figure surface is intentionally narrow. It shows:

A two-axis scatter-style projection of the data
Cluster-colored points for one chosen k-means partition
The current plotted view only, not a full clustering report

This means the chart answers a limited question: what does the current partition look like on the axes now being displayed? It does not by itself answer whether k was well chosen, whether the partition is stable across reruns, or whether another projection would tell a different story.

Required columns

Multiple numeric variables suitable for clustering
A reading task where the currently plotted axes are enough to inspect the displayed partition

K-means is distance-based, so the input variables should be on scales that make Euclidean-style distance comparisons meaningful. If one variable dominates the scale, it can dominate the clustering result as well.

The current implementation renders a cluster-colored two-axis view. This page should stay narrower than a broad clustering-analysis suite: it explains how to read the figure, not how to validate every clustering choice.

K-means outputs should be read as one chosen model view. They do not prove that the observed grouping is the only valid partition or the ground truth of the data.

The choice of k matters. So do scaling and the axes used for display. Those choices shape the picture just as much as the raw clustering result does.

Licklider does not automatically prove that the selected k is the "correct" number of clusters. It also does not automatically prove that the plotted two-axis separation reflects the full multivariate structure better than another projection would.

How to read the result

Read the plot in three steps:

Check whether points assigned to the same cluster look compact or visibly diffuse in the displayed space.
Check whether different cluster colors are clearly separated or heavily overlapping on the current axes.
Ask whether the displayed axes are the right view for your scientific question, or whether a projection such as PCA Biplot would better summarize the structure.

If clusters look clean only after changing scaling, changing axes, or trying several values of k, that is itself part of the interpretation. The visual pattern is conditional on those choices.

Design rationale and references

Licklider frames this figure as an exploratory overlay because k-means returns one partition under one set of modeling choices rather than a formal proof of natural group structure. The result can change with k, initialization, scaling, and feature representation, so the figure is best used to inspect a chosen solution rather than to claim that the data contain uniquely determined clusters [1, 2].

The page also keeps the scope anchored to a two-axis view on purpose. High-dimensional clustering cannot be fully judged from one projection, and two-dimensional displays can either exaggerate or hide separation depending on which variables are shown [2, 3].

Methodological foundations

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281-297). -> Original k-means reference and the basis for treating the result as a partition defined by iterative centroid updates rather than as a discovery guarantee.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed., ch. 14). Springer. -> Standard reference on clustering, including the dependence of k-means on representation, scale, and modeling choices.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning (2nd ed., ch. 12). Springer. -> Accessible explanation of clustering as an exploratory method and of why visual projections can misrepresent higher-dimensional structure.

Current support boundary

This figure shows one current k-means partition on one current two-axis view; it is not a full clustering validation report.
Licklider does not automatically determine the scientifically correct number of clusters.
Licklider does not automatically prove that the current solution is stable across different initializations, feature scalings, or alternative subsets of variables.
Licklider does not automatically detect whether apparent separation in the plotted view disappears in the full feature space, or whether overlapping points in the plot would separate under a different projection.
The figure is therefore best used for exploratory pattern reading, not as standalone evidence that true biological or experimental clusters have been established.

Alternative figures

Use PCA Biplot when a reduced projection is more useful than an explicit clustering overlay.
Use Hierarchical Clustering Heatmap when the exploratory question is closer to dendrogram-style grouping than k-means partitioning.
Use Heatmap when the matrix structure itself matters more than one clustering overlay.

TODO (Phase02+)

Expand only if the public product surface later exposes clearer controls or diagnostics around k choice and clustering stability.