openEBGM

EBGM Scores for Mining Large Contingency Tables

An implementation of DuMouchel's (1999) <doi:10.1080/00031305.1999.10474456> Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the 'PhViD' package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Now includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the 'mederrRank' package.

Total

4,172

Last month

315

Last week

113

Average per day

11

Daily downloads

Total downloads

Description file content

Package
openEBGM
Title
EBGM Scores for Mining Large Contingency Tables
Version
0.7.0
Maintainer
John Ihrie
Description
An implementation of DuMouchel's (1999) Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and quantile scores from the posterior distribution using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the 'PhViD' package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Now includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the 'mederrRank' package.
Depends
R (>= 3.2.3)
License
GPL-2 | GPL-3
URL
LazyData
TRUE
RoxygenNote
6.1.0
Imports
data.table (>= 1.10.0), ggplot2 (>= 2.2.1), stats (>= 3.2.3)
Suggests
dplyr (>= 0.5.0), knitr (>= 1.15.1), rmarkdown (>= 1.2), testthat (>= 1.0.2), tidyr (>= 0.6.0)
VignetteBuilder
knitr
Encoding
UTF-8
NeedsCompilation
no
Packaged
2018-08-16 16:45:18 UTC; John.Ihrie
Author
John Ihrie [cre, aut], Travis Canida [aut], Ismaïl Ahmed [ctb] (author of 'PhViD' package (derived code)), Antoine Poncet [ctb] (author of 'PhViD'), Sergio Venturini [ctb] (author of 'mederrRank' package (derived code)), Jessica Myers [ctb] (author of 'mederrRank')
Repository
CRAN
Date/Publication
2018-08-16 18:00:03 UTC

install.packages('openEBGM')

0.7.0

a month ago

https://journal.r-project.org/archive/2017/RJ-2017-063/index.html

John Ihrie

GPL-2 | GPL-3

Depends on

R (>= 3.2.3)

Imports

data.table (>= 1.10.0), ggplot2 (>= 2.2.1), stats (>= 3.2.3)

Suggests

dplyr (>= 0.5.0), knitr (>= 1.15.1), rmarkdown (>= 1.2), testthat (>= 1.0.2), tidyr (>= 0.6.0)

Discussions