Summarize HDBSCAN data per cluster
Arguments
- m
Tibble with one token per row and HDBSCAN information. The
coords
element of a model resulting fromsummarizeHDBSCAN
.
Value
Tibble with one row per cluster and various HDBSCAN-derived values:
- min_, mean_, max_ and sd_cws
Minimum, mean and maximum, as well as standard deviation, of the number of first-order context words per token in that cluster.
- min_, mean_, max_ and sd_eps
Minimum, mean and maximum, as well as standard deviation, of the \(\epsilon\) value of the tokens in that cluster.
- size, rel_size
Absolute number of tokens in the cluster and proportion of modelled tokens covered by the cluster.
- deeper_than_noise
Proportion of tokens in that cluster with an \(\epsilon\) value lower than the minimum \(\epsilon\) of noise tokens in that model.
- cw_tokens, _types, _ttratio
Union of first-order context words of tokens in that cluster: number of types and of tokens and type-token ratio.