wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.

Version: 1.0.2
Depends: R (≥ 3.3.0)
Imports: digest (≥ 0.6.5), purrr (≥ 0.2.3), rappdirs (≥ 0.3), stringi (≥ 1.0)
Suggests: testthat (≥ 2.1.0), knitr, rmarkdown, covr
Published: 2021-02-11
Author: Jonathan Bratt ORCID iD [aut, cre], Jon Harmon ORCID iD [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
Maintainer: Jonathan Bratt <jonathan.bratt at macmillan.com>
BugReports: https://github.com/jonathanbratt/wordpiece/issues
License: Apache License (≥ 2)
URL: https://github.com/jonathanbratt/wordpiece
NeedsCompilation: no
Materials: README NEWS
CRAN checks: wordpiece results

Downloads:

Reference manual: wordpiece.pdf
Vignettes: Using wordpiece
Package source: wordpiece_1.0.2.tar.gz
Windows binaries: r-devel: wordpiece_1.0.2.zip, r-release: wordpiece_1.0.2.zip, r-oldrel: wordpiece_1.0.2.zip
macOS binaries: r-release: wordpiece_1.0.2.tgz, r-oldrel: wordpiece_1.0.2.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=wordpiece to link to this page.