semasioFlow.contextwords module

semasioFlow.contextwords.listContextwords(type_name, tokenlist, fnames, settings, left_win=None, right_win=None)

Create dataframe with detail on context words of tokens.

It includes the elements that global_columns and line_machine extract from the corpus along with the distance (and side) to the target and whether they occur in the same sentence.

Parameters
  • type_name (str) – Name of the type

  • tokenlist (list of str) – List of token IDs

  • fnames (list of str) – List of file names to find the tokens in

  • settings (dict) – Settings as created for the full workflow

  • left_win (int, optional) – Number of context words to extract from the left side, including sentence delimiters. Defaults to the settings values.

  • right_win (int, optional) – Number of context words to extract from the right side, including sentence delimiters. Defaults to the settings values.

Returns

Data frame with one row per context word per token, information from the corpus and information relative to the target.

Return type

pandas.DataFrame