Summarize HDBSCAN data for a model
Usage
summarizeHDBSCAN(
lemma,
modelname,
input_dir,
output_dir,
minPts = 8,
includePlot = FALSE,
coords_name = ".tsne.30"
)Arguments
- lemma
Name of the lemma, for filenames
- modelname
Name of the model, for coordinates and filename
- input_dir
Directory where the distance matrix is stored
- output_dir
Directory where coordinates are stored. This directory must contain:
A file with the coordinates of the tokens, with a name combining
lemmaandcoords_nameand ending in.tsv.A file with coordinates for the context words, with a name combining
lemmaandcoords_nameand ending in.cws.tsv.A file with semicolon-separated lists of context words, with a name combining
lemmaand.variables.tsv
- minPts
Minimum points for
hdbscan- includePlot
Whether too include the plot (requires
cowplot.)- coords_name
The code in the coordinate files indicating the type of dimensionality reduction performed, for filenames
Value
list with at least two items:
coords: a tibble with one row per token, the coordinates in the pertinent file, and information from
extractHDBSCANas well as thevariablesfile.cws: a tibble with one row per context word and cluster, output from
cwsForClusters, combined with coordinates from the relevant file.hplot: If
includePlot, the HDBSCAN plot.
