Compute Gap statistic for clustered data

compute_gapstat(df, clusters, gap_B = 50, max_k = 14)

Arguments

df

the data used to compute clusters

clusters

output of compute_clusters() or fastcluster::hclust()

gap_B

number of bootstrap samples for cluster::clusGap() function. Default is 50.

max_k

maximum number of clusters to compute the statistic. Default is 14.

Value

a data frame with the Tab component of cluster::clusGap() results

Examples

data_to_cluster <- iris[c("Petal.Length", "Sepal.Length")]
dmat <- compute_dmat(data_to_cluster, "euclidean", TRUE)
clusters <- compute_clusters(dmat, "complete")
gap_results <- compute_gapstat(scale(data_to_cluster), clusters)
head(gap_results)
#>       logW   E.logW       gap     SE.sim k
#> 1 4.134552 4.316297 0.1817452 0.02672406 1
#> 2 3.671650 4.309448 0.6377978 0.02737256 2
#> 3 3.177399 4.301774 1.1243753 0.02888930 3
#> 4 2.983790 4.293219 1.3094294 0.02902321 4
#> 5 2.865874 4.287080 1.4212062 0.02833092 5
#> 6 2.758752 4.280238 1.5214858 0.02875991 6