Prepare dataframes for getContext
.
Usage
setupConcordancer(
lemma = "",
input_dir = "",
cws_detail_path = file.path(input_dir, paste0(lemma, ".cws.detail.tsv")),
ppmi_path = file.path(input_dir, paste0(lemma, ".ppmi.tsv")),
pmi_columnname = "pmi_4",
distance_corrector_func = function(word) !stringr::str_starts(word, "<"),
lemma_from_tid_fun = function(tid) paste(stringr::str_split(tid, "/")[[1]][-c(3, 4)],
collapse = "/")
)
Arguments
- lemma
Name of the lemma: for default filenames
- input_dir
Directory where the files are stored
- cws_detail_path
Path to a dataframe with one row per context word per token and context words with information from the token. Created by
listContextWords
in the semasioFlow Python module.- ppmi_path
Path to a dataframe with one context word per row and frequency information
- pmi_columnname
Name (or prefix) of the column in the dataframe found in
ppmi_path
where weighting values are stored.- distance_corrector_func
Function to filter the rows of the dataframe in
cws_detail_path
based on the values of theword
column, to recalculate distances between words.- lemma_from_tid_fun
Function to extract the target lemma from the tokenID.