Skip to contents

The function expects a dataframe where at least you have token-id's (e.g. _id), a column with character vectors of context words (e.g. cws) and a column with names of clusters (e.g. cluster). The example below shows how to also turn ;-separated values into character vectors within a tibble dataframe.

Usage

cwsForClusters(variables, cws_column, cluster_column, b = 1)

Arguments

variables

Dataframe with IDs, clusters and lists of context words

cws_column

Character string: Name of the column with the character vectors (one per row) of context words

cluster_column

Character string: Name of the column with the name of the clusters (must be a factor)

b

Weight for computing fscore

Value

a tibble with one row per context word per cluster, with frequency information.

Examples

if (FALSE) {

variables <- dplyr::mutate(variables, cws = stringr::str_split(cws, ";"))
cwsForClusters(variables, "cws", "cluster")

}