Compute Gap statistic for clustered data
compute_gapstat(df, clusters, gap_B = 50, max_k = 14)
the data used to compute clusters
output of compute_clusters()
or fastcluster::hclust()
number of bootstrap samples for cluster::clusGap()
function. Default is 50.
maximum number of clusters to compute the statistic. Default is 14.
a data frame with the Tab component of cluster::clusGap()
results
data_to_cluster <- iris[c("Petal.Length", "Sepal.Length")]
dmat <- compute_dmat(data_to_cluster, "euclidean", TRUE)
clusters <- compute_clusters(dmat, "complete")
gap_results <- compute_gapstat(scale(data_to_cluster), clusters)
head(gap_results)
#> logW E.logW gap SE.sim k
#> 1 4.134552 4.316297 0.1817452 0.02672406 1
#> 2 3.671650 4.309448 0.6377978 0.02737256 2
#> 3 3.177399 4.301774 1.1243753 0.02888930 3
#> 4 2.983790 4.293219 1.3094294 0.02902321 4
#> 5 2.865874 4.287080 1.4212062 0.02833092 5
#> 6 2.758752 4.280238 1.5214858 0.02875991 6