Skip to contents

Compute distances per cluster

Usage

clusterDistance(clustering, dists, k = 8)

Arguments

clustering

Named vector with token IDs as names and (HDBSCAN) clusters as values. We assume that each cluster has at least 8 items.

dists

Long format table with one row per pair of tokens (that are not the same), the distance between them, the cluster that the first token belongs to and whether they both belong to the same cluster.

k

Number of nearest neighbors to get maximum distance from

Value

A tibble with one row per cluster and various distance-derived values:

min_, mean_ and max_identicals

Minimum, mean and maximum number of identical tokens per token in the cluster.

min_, mean_ and max_k8

Minimum, mean and maximum distance from each token in the cluster and its 8th nearest neighbour.

min_, mean_, max_ and sd_inner_dist

Minimum, mean, and maximum distance, as well as their standard deviation, between each token of the cluster and all other tokens in the same cluster.

min_, mean_, max_ and sd_outer_dist

Minimum, mean, and maximum distance, as well as their standard deviation, between each token of the cluster and all other tokens in other clusters.