Databases

Databases with R

This CRAN task view contains a list of packages related to accessibility of different databases. This does not include data import/export or data management.

As datasets become larger and larger, it is impossible for people to save them in traditional file formats such as spreadsheet, raw text file, etc., which could not fit on devices with limited storage and could not be easily shared across collaborators. Instead, people nowadays tend to store data in databases for more scalable and reliable data management.

Database systems are often classified based on the database models that they support. Relational databases became dominant in the 1980s. The data in relational databases is modeled as rows and columns in a series of tables with the use of SQL to express the logic for writing and querying data. The tables are relational, e.g. you have a user who users your softwares and those softwares have creators and contributors. Non-relational databases became popular in recent years due to huge demand in storing unstructured data with the use of NoSQL as the query language. Users generally don't need to define the data schema up front. If there are changing requirements in the applications, non-relational databases can be much easier to use and manage.

The content presented in this Task View are undergoing rapid changes in industries and academia. Please send any suggestions to the task view maintainer or submit a pull request or issue to the Github repository of this task view .

The ctv package supports these Task Views. Its functions install.views and update.views allow, respectively, installation or update of packages from a given Task View; the option coreOnly can restrict operations to packages labeled as core below.

Suggestions and corrections by Achim Zeileis, Kirill Müller, Hannes Mühleisen, Rich FitzJohn, Dirk Eddelbuettel, and Hadley Wickham (as well as others I may have forgotten to add here) are gratefully acknowledged. Thanks to Dirk Eddelbuettel who made the initial .ctv file and the Markdown conversion script available at the Github repository of CRAN Task View for High Performance Computing here. Last but not least, thanks to Achim Zeileis who helped me get started on organizing this task view.

Relational Databases

This section includes packages that provides access to relational databases within R.

  • The DBI package provides a database interface definition for communication between R and relational database management systems. It's worth noting that some packages try to follow this interface definition (DBI-compliant) but many existing packages don't.
  • The RMariaDB package provides a DBI-compliant interface to MariaDB and MySQL.
  • The RMySQL package provides the interface to MySQL. Note that this is the legacy DBI interface to MySQL and MariaDB based on old code ported from S-PLUS. A modern MySQL client based on Rcpp is available from the RMariaDB package we listed above.
  • Packages for PostgreSQL, an open-source relational database:
    • The RPostgreSQL package and RPostgres package both provide fully DBI-compliant Rcpp-backed interfaces to PostgreSQL.
    • The rpostgis package provides the interface to its spatial extension PostGIS.
    • The RGreenplum provides a fully DBI-compliant interface to Greenplum, an open-source parallel database on top of PostgreSQL.
  • The ROracle package is a DBI-compliant Oracle database driver based on the OCI.
  • Packages for SQLite, a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine:
    • The RSQLite package embeds the SQLite database engine in R and provides an interface compliant with the DBI package.
    • The filehashSQLite package is a simple key-value database using SQLite as the backend.
    • The liteq package provides temporary and permanent message queues for R, built on top of SQLite.
  • The bigrquery package provides the interface to Google BigQuery, Google's fully managed, petabyte scale, low cost analytics data warehouse.
  • MonetDBLite packages provides the interface to MonetDB and can run database embedded in R directly.
  • The RDruid package provides the interface to Apache Druid, a high performance analytics data store for event-driven data.
  • The RH2 package provides the interface to H2 Database Engine, the Java SQL database.
  • The influxdbr package provides the interface to InfluxDB, a time series database designed to handle high write and query loads.
  • The odbc package provides a DBI-compliant interface to drivers of Open Database Connectivity (ODBC), which is a low-level, high-performance interface that is designed specifically for relational data stores.
  • The RPresto package implements a DBI-compliant interface to Presto, an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
  • The dplyr.teradata package is the Teradata backend for dplyr package, which makes it possible to operate Teradata database in the same way as manipulating data frames with dplyr.
  • The RJDBC package is an implementation of R's DBI interface using JDBC as a back-end. This allows R to connect to any DBMS that has a JDBC driver.
  • The implyr package provides the back-end for Apache Impala, which enables low-latency SQL queries on data stored in the Hadoop Distributed File System (HDFS), Apache HBase, Apache Kudu, Amazon Simple Storage Service (S3), Microsoft Azure Data Lake Store (ADLS), and Dell EMC Isilon.

Non-Relational Databases

This section includes packages that provides access to non-relational databases within R.

  • Packages for Redis, an open-source, in-memory data structure store that can be used as a database, cache and message broker:
    • The RcppRedis package provides interface to Redis using the hiredis library.
    • The redux package provides a low-level interface to Redis, allowing execution of arbitrary Redis commands with almost no interface, and a high-level generated interface to more than 200 redis commands.
  • Packages for Elasticsearch, an open-source, RESTful, distributed search and analytics engine:
    • The elastic package provides a general purpose interface to Elasticsearch.
    • The uptasticsearch package is a Elasticsearch client tailored to data science workflows.
  • The mongolite package provides a high-level, high-performance MongoDB client based on libmongoc, including support for aggregation, indexing, map-reduce, streaming, SSL encryption and SASL authentication.
  • The R4CouchDB package provides a collection of functions for basic database and document management operations in CouchDB.
  • The RCassandra package provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.
  • The aws.dynamodb package provides access to Amazon DynamoDB.
  • The rrocksdb package provides access to RocksDB.

Databases Tools

This section includes packages that provides tools for working and testing with databases, databases table manipulations, etc.

  • The pool package enables the creation of object pools, which make it less computationally expensive to fetch a new object.
  • The DBItest package is a helper that tests DBI back ends for conformity to the interface.
  • The dbfaker package is to tool to ensure the validity of database writes. It provides a set of utilities to analyze and type check the properties of data frames that are to be written to databases with SQL support.
  • The dbplyr package is a dplyr back-end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a DBI back-end; more advanced features require SQL translation to be provided by the package author.
  • The pivot package extends the tidyverse packages dbplyr and tidyr functionality with pivot(), i.e. spread(), and unpivot(), i.e. gather(), for reshaping remote tables. Currently only 'Microsoft SQL Server' is supported.
  • The sqldf package provides functionalities to manipulate R Data Frames Using SQL.
  • The pointblank package provides tools to validate data tables in databases such as PostgreSQL and MySQL.
  • The TScompare package provides utilities for comparing the equality of series on two databases.

View on CRAN

29 days ago

Yuan Tang

Packages

4
47
84
92
191
298
325
449
517
677
788
870
1086
1172
1207
1423
1566
2218
2431
2559
2742
2976
3633
3735
5419
5834
6997
7701
8280
8319
9126
10244

Discussions