semasioFlow.sample module¶
- semasioFlow.sample.sampleTypes(selection, fnames, settings, oneperfile=True, concordance=None)¶
Generate a random sample of tokens and the list of files required to extract them.
- Parameters
selection (dict) – Types to look for as keys, number of tokens to extract from each of them as values.
filenames (str or list) – Selection of filenames to search: either the path to the file with the list, as a string, or the list itself.
settings (dict) – Configuration settings as designed from the nephosem workflow.
oneperfile (bool) – Whether only one token of each lemma can be extracted from the same file.
concordance (str) – File name to store concordance. If None, then no concordance is generated.
- Returns
A list of token IDs and the list of files where they can be found. Not separated by type.
- Return type
tuple