tokenizers

A Consistent Interface to Tokenize Natural Language Text

Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.

Total

86,777

Last month

5,496

Last week

1,562

Average per day

183

Daily downloads

Total downloads

Description file content

Package
tokenizers
Type
Package
Title
A Consistent Interface to Tokenize Natural Language Text
Version
0.1.4
Description
Convert natural language text into tokens. The tokenizers have a consistent interface and are compatible with Unicode, thanks to being built on the 'stringi' package. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, lines, and regular expressions.
License
MIT + file LICENSE
LazyData
yes
URL
BugReports
https://github.com/ropensci/tokenizers/issues
RoxygenNote
5.0.1
Depends
R (>= 3.1.3)
Imports
stringi (>= 1.0.1), Rcpp (>= 0.12.3), SnowballC (>= 0.5.1)
LinkingTo
Rcpp
Suggests
testthat, covr, knitr, rmarkdown
VignetteBuilder
knitr
NeedsCompilation
yes
Packaged
2016-08-29 19:15:04 UTC; lmullen
Author
Lincoln Mullen [aut, cre], Dmitriy Selivanov [ctb]
Maintainer
Lincoln Mullen
Repository
CRAN
Date/Publication
2016-08-29 22:59:29

install.packages('tokenizers')

0.1.4

a year ago

https://github.com/ropensci/tokenizers

Lincoln Mullen

MIT + file LICENSE

Depends on

R (>= 3.1.3)

Imports

stringi (>= 1.0.1), Rcpp (>= 0.12.3), SnowballC (>= 0.5.1)

Suggests

testthat, covr, knitr, rmarkdown

Discussions