The glossr package encourages you to keep your examples in one dataframe that you can extract glosses from. You can filter it based on the label names or any other variables and print a series of glosses next to each other with one call.
If you like this feature and you have, for example, a dataframe
called glosses
, you might find yourself calling variations
of gloss_df(filter(glosses, "my-label"))
multiple times in
a text. This vignette will show you how to work with
gloss_factory()
so that you only need to type
my_gloss("my-label")
instead. In addition, this function
performs some validation on your dataframe to avoid undesired
output.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#>
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#>
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Create a gloss factory
The first thing you need to do is assign the return value of
gloss_factory()
to a short variable that works for you. I
recommend trying this out in the console, and then calling it in a setup
R chunk that doesn’t print messages or warnings.
by_label <- gloss_factory(glosses)
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original`, `parsed`, and `language` (aligned columns)
#> ✔ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
By default (unless verbose = FALSE
),
gloss_factory()
prints a few messages after checking the
dataframe that was provided: it checks whether there are
source
, translation
and label
columns (“not aligned”, because they are printed as running text) and
which would be the remaining columns with content for the text lines
(“aligned”, because they are aligned to each other word by word). Notice
how here it includes the language
column in the group of
aligned lines, which we don’t want, so we would prefer to remove it.
If any of the expected columns (source
,
translation
or label
) are not present, it will
print a warning. These are just warnings: maybe it’s exactly what
you are expecting, and that’s ok.
by_label <- glosses |>
select(-language, -translation) |>
gloss_factory()
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original` and `parsed` (aligned columns)
#> ✖ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
If there are too many text columns, it will also warn you:
by_label <- glosses|>
rename(trans = translation)|>
gloss_factory()
#> ! There are 4 columns that can be printed as text: `original`, `parsed`,
#> `trans`, and `language`. Only the first three will be used.
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original`, `parsed`, and `trans` (aligned columns)
#> ✖ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
We can either remove the extra column from the dataframe before
giving it to gloss_factory()
or add its name to the
ignore_columns
argument. This allows us to use the column
for filtering without gloss_df()
finding out of its
existence. Other kinds of modifications, however, would have to be
performed beforehand.
modified_glosses <- glosses |>
mutate(source = paste0("(", source, ")"))
by_label <- modified_glosses |>
gloss_factory(ignore_columns = "language")
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original` and `parsed` (aligned columns)
#> ✔ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
gloss_factory()
is a function
factory: its output is a function.
class(by_label)
#> [1] "function"
This means that you call gloss_factory()
once at the
beginning, and then your created function as many times as you need.
Here the function is called by_label()
, but you can choose
the name that suits you best. As you can see below,
by_label("heartwarming-jp")
is equivalent to
gloss_df(filter(modified_glosses, label == "heartwarming-jp"))
.
by_label("heartwarming-jp")
-
(Shindo 2015:660)
Kotae-nagara otousan to okaasan wa honobonoto atatakai2 mono ni tsutsum-areru kimochi ga shi-ta.
reply-while father and mother TOP heartwarming warm thing with surround-PASS feeling NOM do-PST
“While replying (to your question), Father and Mother felt like they were surrounded by something heart warming.”
Filter by label or id
By default, the function created by gloss_factory()
will
take a label or set of labels and use it for filtering. In principle,
the call below is equivalent to
gloss_df(filter(modified_glosses, label %in% c("heartwarming-jp", "languid-jp", "feel-dutch")))
.
However, unlike filter()
, it keeps the requested order of
your items!
by_label("heartwarming-jp", "languid-jp", "feel-dutch")
-
(Shindo 2015:660)
Kotae-nagara otousan to okaasan wa honobonoto atatakai2 mono ni tsutsum-areru kimochi ga shi-ta.
reply-while father and mother TOP heartwarming warm thing with surround-PASS feeling NOM do-PST
“While replying (to your question), Father and Mother felt like they were surrounded by something heart warming.”
-
(Shindo 2015:660)
Ainiku sonna shumi wa nai. Tsumetai-none. Kedaru-souna koe da-tta.
unfortunately such interest TOP not.exist cold-EMPH languid-seem voice COP-PST
“Unfortunately I never have such an interest. You are so cold. (Her) voice sounded languid.”
-
(Ross 1996:204)
Ik heb het koud
1SG have 3SG COLD.A
“I am cold; literally: I have it cold.”
You could also set a different column for your ids with the
id_column
argument. gloss_factory()
will warn
you if the values are not unique (in case you were expecting them
to).
by_language <- modified_glosses |>
gloss_factory(id_column = "language", ignore_columns = "language")
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original` and `parsed` (aligned columns)
#> ✔ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
#> ! The values in `language` are not unique. Only the first match of repeated ids will be returned.
by_language("Icelandic")
-
(Einarsson 1945:170)
Mér er heitt/kalt
1SG.DAT COP.1SG.PRS hot/cold.A
“I am hot/cold.”
You will also get a warning if one of your requested ids is not in your dataset.
by_language("Japanese", "Mandarin")
#> ! The following ids are not present in the dataset:
#> • Mandarin
-
(Shindo 2015:660)
Kotae-nagara otousan to okaasan wa honobonoto atatakai2 mono ni tsutsum-areru kimochi ga shi-ta.
reply-while father and mother TOP heartwarming warm thing with surround-PASS feeling NOM do-PST
“While replying (to your question), Father and Mother felt like they were surrounded by something heart warming.”
Filter with other conditional statements
While filtering by label name might be a common circumstance, you
might want a bit more freedom. It is possible to create a different
function with the use_conditionals
argument. In that case,
the new function will take whatever conditionals you want to ask and
send them to dplyr::filter()
.
by_cond <- modified_glosses |>
gloss_factory(use_conditionals = TRUE, ignore_columns = "language")
#> ℹ The following columns will be used for the gloss texts, in the following order:
#> ✔ `source` (not aligned!)
#> ✔ `original` and `parsed` (aligned columns)
#> ✔ `translation` (not aligned!)
#> ✔ The `label` column will be used for labels.
by_cond(str_ends(label, "jp"))
-
(Shindo 2015:660)
Kotae-nagara otousan to okaasan wa honobonoto atatakai2 mono ni tsutsum-areru kimochi ga shi-ta.
reply-while father and mother TOP heartwarming warm thing with surround-PASS feeling NOM do-PST
“While replying (to your question), Father and Mother felt like they were surrounded by something heart warming.”
-
(Shindo 2015:660)
Ainiku sonna shumi wa nai. Tsumetai-none. Kedaru-souna koe da-tta.
unfortunately such interest TOP not.exist cold-EMPH languid-seem voice COP-PST
“Unfortunately I never have such an interest. You are so cold. (Her) voice sounded languid.”
Many factories?
One of the advantages of a function factory is that you can create a function tailored to the dataset you’re working with here. You don’t need to call your dataset constantly and you save in typing.
In addition, you could have multiple factories in one project. Within
a file, you may create a by_label()
and a
by_cond()
functions to work with label and conditional
filtering, whatever suits you best at any time. Or you could also have a
dutch_gloss()
and chinese_gloss()
, for
example, each using a different dataset!