Skip to contents

Summarize HDBSCAN data per cluster

Usage

clusterHDBSCAN(m)

Arguments

m

Tibble with one token per row and HDBSCAN information. The coords element of a model resulting from summarizeHDBSCAN.

Value

Tibble with one row per cluster and various HDBSCAN-derived values:

min_, mean_, max_ and sd_cws

Minimum, mean and maximum, as well as standard deviation, of the number of first-order context words per token in that cluster.

min_, mean_, max_ and sd_eps

Minimum, mean and maximum, as well as standard deviation, of the \(\epsilon\) value of the tokens in that cluster.

size, rel_size

Absolute number of tokens in the cluster and proportion of modelled tokens covered by the cluster.

deeper_than_noise

Proportion of tokens in that cluster with an \(\epsilon\) value lower than the minimum \(\epsilon\) of noise tokens in that model.

cw_tokens, _types, _ttratio

Union of first-order context words of tokens in that cluster: number of types and of tokens and type-token ratio.