Compute distances per cluster
Arguments
- clustering
Named vector with token IDs as names and (HDBSCAN) clusters as values. We assume that each cluster has at least 8 items.
- dists
Long format table with one row per pair of tokens (that are not the same), the distance between them, the cluster that the first token belongs to and whether they both belong to the same cluster.
- k
Number of nearest neighbors to get maximum distance from
Value
A tibble with one row per cluster and various distance-derived values:
- min_, mean_ and max_identicals
Minimum, mean and maximum number of identical tokens per token in the cluster.
- min_, mean_ and max_k8
Minimum, mean and maximum distance from each token in the cluster and its 8th nearest neighbour.
- min_, mean_, max_ and sd_inner_dist
Minimum, mean, and maximum distance, as well as their standard deviation, between each token of the cluster and all other tokens in the same cluster.
- min_, mean_, max_ and sd_outer_dist
Minimum, mean, and maximum distance, as well as their standard deviation, between each token of the cluster and all other tokens in other clusters.