Set up Concordancer — setupConcordancer • semcloud

Prepare dataframes for getContext.

Usage

setupConcordancer(
  lemma = "",
  input_dir = "",
  cws_detail_path = file.path(input_dir, paste0(lemma, ".cws.detail.tsv")),
  ppmi_path = file.path(input_dir, paste0(lemma, ".ppmi.tsv")),
  pmi_columnname = "pmi_4",
  distance_corrector_func = function(word) !stringr::str_starts(word, "<"),
  lemma_from_tid_fun = function(tid) paste(stringr::str_split(tid, "/")[[1]][-c(3, 4)],
    collapse = "/")
)

Arguments

lemma: Name of the lemma: for default filenames
input_dir: Directory where the files are stored
cws_detail_path: Path to a dataframe with one row per context word per token and context words with information from the token. Created by listContextWords in the semasioFlow Python module.
ppmi_path: Path to a dataframe with one context word per row and frequency information
pmi_columnname: Name (or prefix) of the column in the dataframe found in ppmi_path where weighting values are stored.
distance_corrector_func: Function to filter the rows of the dataframe in cws_detail_path based on the values of the word column, to recalculate distances between words.
lemma_from_tid_fun: Function to extract the target lemma from the tokenID.

Value

Enriched dataframe with one row per context word per token, weight values, corrected distances and a column indicating the right target lemma (in case you have more than one).