Map context words and HDBSCAN clusters — cwsForClusters • semcloud

The function expects a dataframe where at least you have token-id's (e.g. _id), a column with character vectors of context words (e.g. cws) and a column with names of clusters (e.g. cluster). The example below shows how to also turn ;-separated values into character vectors within a tibble dataframe.

Usage

cwsForClusters(variables, cws_column, cluster_column, b = 1)

Arguments

variables: Dataframe with IDs, clusters and lists of context words
cws_column: Character string: Name of the column with the character vectors (one per row) of context words
cluster_column: Character string: Name of the column with the name of the clusters (must be a factor)
b: Weight for computing fscore

Value

a tibble with one row per context word per cluster, with frequency information.

Examples

if (FALSE) {

variables <- dplyr::mutate(variables, cws = stringr::str_split(cws, ";"))
cwsForClusters(variables, "cws", "cluster")

}