semasioFlow.sample module

semasioFlow.sample.sampleTypes(selection, fnames, settings, oneperfile=True, concordance=None)

Generate a random sample of tokens and the list of files required to extract them.

Parameters
  • selection (dict) – Types to look for as keys, number of tokens to extract from each of them as values.

  • filenames (str or list) – Selection of filenames to search: either the path to the file with the list, as a string, or the list itself.

  • settings (dict) – Configuration settings as designed from the nephosem workflow.

  • oneperfile (bool) – Whether only one token of each lemma can be extracted from the same file.

  • concordance (str) – File name to store concordance. If None, then no concordance is generated.

Returns

A list of token IDs and the list of files where they can be found. Not separated by type.

Return type

tuple