Summarize HDBSCAN data for a model
Usage
summarizeHDBSCAN(
lemma,
modelname,
input_dir,
output_dir,
minPts = 8,
includePlot = FALSE,
coords_name = ".tsne.30"
)
Arguments
- lemma
Name of the lemma, for filenames
- modelname
Name of the model, for coordinates and filename
- input_dir
Directory where the distance matrix is stored
- output_dir
Directory where coordinates are stored. This directory must contain:
A file with the coordinates of the tokens, with a name combining
lemma
andcoords_name
and ending in.tsv
.A file with coordinates for the context words, with a name combining
lemma
andcoords_name
and ending in.cws.tsv
.A file with semicolon-separated lists of context words, with a name combining
lemma
and.variables.tsv
- minPts
Minimum points for
hdbscan
- includePlot
Whether too include the plot (requires
cowplot
.)- coords_name
The code in the coordinate files indicating the type of dimensionality reduction performed, for filenames
Value
list with at least two items:
coords: a tibble with one row per token, the coordinates in the pertinent file, and information from
extractHDBSCAN
as well as thevariables
file.cws: a tibble with one row per context word and cluster, output from
cwsForClusters
, combined with coordinates from the relevant file.hplot: If
includePlot
, the HDBSCAN plot.