mlvocab

Vocabulary and Corpus Preprocessing for Natural Language Pipelines

Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.

Total

1,116

Last month

16

Last week

4

Average per day

1

Daily downloads

Total downloads

Description file content

Package
mlvocab
Title
Vocabulary and Corpus Preprocessing for Natural Language Pipelines
Version
0.1
Description
Utilities for preprocessing of text corpora into data structures suitable for natural language models: integer sequences or matrices, vocabulary embedding matrices, term-doc, doc-term, term co-occurrence matrices etc. All functions allow for full or partial hashing of the terms in the vocabulary.
Depends
methods, Rcpp (>= 0.12), Matrix, digest (>= 0.6.8), sparsepp (>= 0.2.0), R (>= 3.4.0)
License
GPL-3
Encoding
UTF-8
LinkingTo
Rcpp (>= 0.12.9), digest (>= 0.6.8), sparsepp (>= 0.2.0)
Suggests
testthat, knitr
LazyData
true
SystemRequirements
C++11 with suport for regex (such as GCC 4.9 or later, > 5 prefered)
BugReports
https://github.com/vspinu/mlvocab/issues
URL
RoxygenNote
6.1.0
NeedsCompilation
yes
Packaged
2018-09-17 17:50:05 UTC; vspinu
Author
Vitalie Spinu [aut, cre]
Maintainer
Vitalie Spinu
Repository
CRAN
Date/Publication
2018-09-18 08:40:02 UTC

install.packages('mlvocab')

0.1

2 months ago

https://github.com/vspinu/mlvocab/

Vitalie Spinu

GPL-3

Depends on

methods, Rcpp (>= 0.12), Matrix, digest (>= 0.6.8), sparsepp (>= 0.2.0), R (>= 3.4.0)

Suggests

testthat, knitr

Discussions