mlvocab

Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Total

337

Last month

92

Last week

21

Average per day

3

Daily downloads

Total downloads

Description file content

Package
mlvocab
Title
Vocabulary and Corpus Preprocessing for Natural Language Pipelines
Version
0.0.1
Description
Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.
Depends
R (>= 3.4.0)
License
GPL-3
Encoding
UTF-8
Imports
Rcpp (>= 0.12), Matrix, digest (>= 0.6.8), sparsepp (>= 0.2.0)
LinkingTo
Rcpp, digest (>= 0.6.8), sparsepp (>= 0.2.0)
Suggests
testthat, knitr
LazyData
true
SystemRequirements
C++11
BugReports
https://github.com/vspinu/mlvocab/issues
URL
RoxygenNote
6.0.1
NeedsCompilation
yes
Packaged
2018-04-12 19:02:49 UTC; vspinu
Author
Vitalie Spinu [aut, cre]
Maintainer
Vitalie Spinu
Repository
CRAN
Date/Publication
2018-04-13 08:50:01 UTC

install.packages('mlvocab')

0.0.1

3 months ago

https://github.com/vspinu/mlvocab/

Vitalie Spinu

GPL-3

Depends on

R (>= 3.4.0)

Imports

Rcpp (>= 0.12), Matrix, digest (>= 0.6.8), sparsepp (>= 0.2.0)

Suggests

testthat, knitr

Discussions