nproc

Neyman-Pearson (NP) Classification Algorithms and NP Receiver Operating Characteristic (NP-ROC) Curves

In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose in a data adaptive way and compare different NP classifiers.

Total

10,980

Last month

392

Last week

114

Average per day

13

Daily downloads

Total downloads

Description file content

Package
nproc
Type
Package
Title
Neyman-Pearson (NP) Classification Algorithms and NP Receiver Operating Characteristic (NP-ROC) Curves
Version
2.1.5
Date
2020-01-13
Imports
glmnet, e1071, randomForest, naivebayes, MASS, parallel, ada, stats, graphics, ROCR, tree
Description
In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose in a data adaptive way and compare different NP classifiers.
License
GPL-2
LazyData
TRUE
RoxygenNote
6.1.0
URL
Suggests
knitr, rmarkdown
VignetteBuilder
knitr
NeedsCompilation
no
Packaged
2020-01-13 16:01:14 UTC; yangfeng
Author
Yang Feng [aut, cre], Jessica Li [aut], Xin Tong [aut], Ye Tian [ctb]
Maintainer
Yang Feng
Repository
CRAN
Date/Publication
2020-01-13 19:20:05 UTC

install.packages('nproc')

2.1.5

11 days ago

http://advances.sciencemag.org/content/4/2/eaao1659

Yang Feng

GPL-2

Imports

glmnet, e1071, randomForest, naivebayes, MASS, parallel, ada, stats, graphics, ROCR, tree

Suggests

knitr, rmarkdown

Discussions