Skip to contents

Summarize HDBSCAN data for a model

Usage

summarizeHDBSCAN(
  lemma,
  modelname,
  input_dir,
  output_dir,
  minPts = 8,
  includePlot = FALSE,
  coords_name = ".tsne.30"
)

Arguments

lemma

Name of the lemma, for filenames

modelname

Name of the model, for coordinates and filename

input_dir

Directory where the distance matrix is stored

output_dir

Directory where coordinates are stored. This directory must contain:

  • A file with the coordinates of the tokens, with a name combining lemma and coords_name and ending in .tsv.

  • A file with coordinates for the context words, with a name combining lemma and coords_name and ending in .cws.tsv.

  • A file with semicolon-separated lists of context words, with a name combining lemma and .variables.tsv

minPts

Minimum points for hdbscan

includePlot

Whether too include the plot (requires cowplot.)

coords_name

The code in the coordinate files indicating the type of dimensionality reduction performed, for filenames

Value

list with at least two items:

  • coords: a tibble with one row per token, the coordinates in the pertinent file, and information from extractHDBSCAN as well as the variables file.

  • cws: a tibble with one row per context word and cluster, output from cwsForClusters, combined with coordinates from the relevant file.

  • hplot: If includePlot, the HDBSCAN plot.