WebTechnologies

Web Technologies and Services

This Task View contains information about to use R and the world wide web together. The base version of R does not ship with many tools for interacting with the web. Thankfully, there are an increasingly large number of tools for interacting with the web. This task view focuses on packages for obtaining web-based data and information, frameworks for building web-based R applications, and online services that can be accessed from R. A list of available packages and functions is presented below, grouped by the type of activity. The Open Data Task View provides further discussion of online data sources that can be accessed from R.

If you have any comments or suggestions for additions or improvements for this Task View, go to GitHub and submit an issue, or make some changes and submit a pull request. If you can't contribute on GitHub, type="text/javascript"> . If you have an issue with one of the packages discussed below, please contact the maintainer of that package. If you know of a web service, API, data source, or other online resource that is not yet supported by an R package, consider adding it to the package development to do list on GitHub.

Tools for Working with the Web from R

Core Tools For HTTP Requests

There are two packages that should cover most use cases of interacting with the web from R. httr provides a user-friendly interface for executing HTTP methods (GET, POST, PUT, HEAD, DELETE, etc.) and provides support for modern web authentication protocols (OAuth 1.0, OAuth 2.0). HTTP status codes are helpful for debugging HTTP calls. httr makes this easier using, for example, stop_for_status(), which gets the http status code from a response object, and stops the function if the call was not successful. (See also warn_for_status().) Note that you can pass in additional libcurl options to the config parameter in http calls. RCurl is a lower-level package that provides a closer interface between R and the libcurl C library, but is less user-friendly. It may be useful for operations on web-based XML or to perform FTP operations. For more specific situations, the following resources may be useful:

  • curl is another libcurl client that provides the curl() function as an SSL-compatible replacement for base R's url() and support for http 2.0, ssl (https, ftps), gzip, deflate and more. For websites serving insecure HTTP (i.e. using the "http" not "https" prefix), most R functions can extract data directly, including read.table and read.csv; this also applies to functions in add-on packages such as jsonlite::fromJSON and XML::parseXML. httpRequest is another low-level package for HTTP requests that implements the GET, POST and multipart POST verbs.
  • request (GitHub) provides a high-level package that is useful for developing other API client packages. httping (GitHub) provides simplified tools to ping and time HTTP requests, around httr calls. httpcache (GitHub) provides a mechanism for caching HTTP requests.
  • For dynamically generated webpages (i.e., those requiring user interaction to display results), RSelenium (GitHub) can be used to automate those interactions and extract page contents. It provides a set of bindings for the Selenium 2.0 webdriver using the JsonWireProtocol. It can also aid in automated application testing, load testing, and web scraping. rdom (not on CRAN) uses phantomjs to access a webpage's Document Object Model (DOM).
  • Another, higher-level alternative package useful for webscraping is rvest (GitHub), which is designed to work with magrittr to make it easy to express common web scraping tasks.
  • Many base R tools can be used to download web content, provided that the website does not use SSL (i.e., the URL does not have the "https" prefix). download.file() is a general purpose function that can be used to download a remote file. For SSL, the download() function in downloader wraps download.file(), and takes all the same arguments.
  • Tabular data sets (e.g., txt, csv, etc.) can be input using read.table(), read.csv(), and friends, again assuming that the files are not hosted via SSL. An alternative is to use httr::GET (or RCurl::getURL) to first read the file into R as a character vector before parsing with read.table(text=...), or you can download the file to a local directory. rio (GitHub) provides an import() function that can read a number of common data formats directly from an https:// URL. The repmis function source_data() can load and cache plain-text data from a URL (either http or https). That package also includes source_Dropbox() for downloading/caching plain-text data from non-public Dropbox folders and source_XlsxData() for downloading/caching Excel xlsx sheets.
  • Authentication: Using web resources can require authentication, either via API keys, OAuth, username:password combination, or via other means. Additionally, sometimes web resources that require authentication be in the header of an http call, which requires a little bit of extra work. API keys and username:password combos can be combined within a url for a call to a web resource (api key: http://api.foo.org/?key=yourkey; user/pass: http://username:password@api.foo.org), or can be specified via commands in RCurl or httr. OAuth is the most complicated authentication process, and can be most easily done using httr. See the 6 demos within httr, three for OAuth 1.0 (linkedin, twitter, vimeo) and three for OAuth 2.0 (facebook, GitHub, google). ROAuth is a package that provides a separate R interface to OAuth. OAuth is easier to to do in httr, so start there. googleAuthR provides an OAuth 2.0 setup specifically for Google web services.

Parsing Structured Web Data

The vast majority of web-based data is structured as plain text, HTML, XML, or JSON (javascript object notation). Web service APIs increasingly rely on JSON, but XML is still prevalent in many applications. There are several packages for specifically working with these format. These functions can be used to interact directly with insecure webpages or can be used to parse locally stored or in-memory web files.

  • XML/HTML: There are two packages for working with XML: XML and xml2 (GitHub). Both support general XML (and HTML) parsing, including XPath queries. The package xml2 is less fully featured, but more user friendly with respect to memory management, classes (e.g., XML node vs. node set vs. document), and namespaces. Of the two, only the XML supports de novo creation of XML nodes and documents. The XML2R (GitHub) package is a collection of convenient functions for coercing XML into data frames. An alternative to XML is selectr, which parses CSS3 Selectors and translates them to XPath 1.0 expressions. XML package is often used for parsing xml and html, but selectr translates CSS selectors to XPath, so can use the CSS selectors instead of XPath. The selectorgadget browser extension can be used to identify page elements. RHTMLForms reads HTML documents and obtains a description of each of the forms it contains, along with the different elements and hidden fields. scrapeR provides additional tools for scraping data from HTML and XML documents. htmltab extracts structured information from HTML tables, similar to XML::readHTMLTable of the XML package, but automatically expands row and column spans in the header and body cells, and users are given more control over the identification of header and body rows which will end up in the R table.
  • JSON: There are several packages for reading and writing JSON: rjson, RJSONIO, and jsonlite. jsonlite includes a different parser from RJSONIO called yajl. We recommend using jsonlite. Check out the paper describing jsonlite by Jeroen Ooms http://arxiv.org/abs/1403.2805. tidyjson (GitHub) converts JSON into a data.frame. jqr provides bindings for the fast JSON library, jq. jsonvalidate (GitHub) validates JSON against a schema using the "is-my-json-valid" Node.js library.
  • RSS/Atom: feedeR (not on CRAN) can be used to parse RSS or Atom feeds.
  • swagger (not on CRAN) can be used to automatically generate functions for working with an web service API that provides documentation in Swagger.io format.

Tools for Working with URLs

  • The httr::parse_url() function can be used to extract portions of a URL. The RCurl::URLencode() and utils::URLencode() functions can be used to encode character strings for use in URLs. utils::URLdecode() decodes back to the original strings. urltools (GitHub) can also handle URL encoding, decoding, parsing, and parameter extraction.
  • The tldextract package extract top level domains and subdomains from a host name. It's a port of a Python library of the same name.
  • iptools can facilitate working with IPv4 addresses, including for use in geolocation.
  • urlshorteneR (GitHub) offers URL expansion and analysis for Bit.ly, Goo.gl, and is.gd. longurl uses the longurl.org API to provide similar functionality.
  • gdns (not on CRAN) provides access to Google's secure HTTP-based DNS resolution service.

Tools for Working with Scraped Webpage Contents

  • Several packages can be used for parsing HTML documents. boilerpipeR provides generic extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. RTidyHTML interfaces to the libtidy library for correcting HTML documents that are not well-formed. This library corrects common errors in HTML documents. W3CMarkupValidator provides an R Interface to W3C Markup Validation Services for validating HTML documents.
  • For XML documents, the XMLSchema package provides facilities in R for reading XML schema documents and processing them to create definitions for R classes and functions for converting XML nodes to instances of those classes. It provides the framework for meta-computing with XML schema in R. xslt is a package providing an interface to the xmlwrapp an XML processing library that provides an XSLT engine for transforming XML data using a transform stylesheet. (It can be seen as a modern replacement for Sxslt, which is an interface to Dan Veillard's libxslt translator, and the SXalan package.) This may be useful for webscraping, as well as transforming XML markup into another human- or machine-readable format (e.g., HTML, JSON, plain text, etc.). SSOAP provides a client-side SOAP (Simple Object Access Protocol) mechanism. It aims to provide a high-level interface to invoke SOAP methods provided by a SOAP server. XMLRPC provides an implementation of XML-RPC, a relatively simple remote procedure call mechanism that uses HTTP and XML. This can be used for communicating between processes on a single machine or for accessing Web services from within R.
  • Rcompression (not on CRAN): Interface to zlib and bzip2 libraries for performing in-memory compression and decompression in R. This is useful when receiving or sending contents to remote servers, e.g. Web services, HTTP requests via RCurl.
  • tm.plugin.webmining: Extensible text retrieval framework for news feeds in XML (RSS, ATOM) and JSON formats. Currently, the following feeds are implemented: Google Blog Search, Google Finance, Google News, NYTimes Article Search, Reuters News Feed, Yahoo Finance and Yahoo Inplay.
  • webshot uses PhantomJS to provide screenshots of web pages without a browser. It can be useful for testing websites (such as Shiny applications).

Other Useful Packages and Functions

  • Javascript: V8 (GitHub) is an R interface to Google's open source, high performance JavaScript engine. It can wrap Javascript libraries as well as NPM packages. The SpiderMonkey package provides another means of evaluating JavaScript code, creating JavaScript objects and calling JavaScript functions and methods from within R. This can work by embedding the JavaScript engine within an R session or by embedding R in an browser such as Firefox and being able to call R from JavaScript and call back to JavaScript from R.
  • Email:: mailR is an interface to Apache Commons Email to send emails from within R. sendmailR provides a simple SMTP client. gmailr provides access the Google's gmail.com RESTful API.
  • Miscellaneous: webutils (GitHub) contains various functions for developing web applications, including parsers for application/x-www-form-urlencoded as well as multipart/form-data. mime (GitHub) guesses the MIME type for a file from its extension. rsdmx (GitHub) provides tools to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework. The package currently focuses on the SDMX XML standard format (SDMX-ML). robotstxt (not on CRAN) provides R6 classes for parsing and checking robots.txt files. uaparserjs (GitHub) uses the javascript "ua-parser" library to parse User-Agent HTTP headers.

Web and Server Frameworks

  • DeployR Open is a server-based framework for integrating R into other applications via Web Services.
  • The shiny package makes it easy to build interactive web applications with R.
  • Other web frameworks include: fiery (GitHub) that is meant to be more flexible but less easy to use than shiny; prairie (not on CRAN) which is a lightweight web framework that uses magrittr-style syntax and is modeled after expressjs; rcloud (not on CRAN) which provides an iPython notebook-style web-based R interface; and Rook, which contains the specification and convenience software for building and running Rook applications.
  • The opencpu framework for embedded statistical computation and reproducible research exposes a web API interfacing R, LaTeX and Pandoc. This API is used for example to integrate statistical functionality into systems, share and execute scripts or reports on centralized servers, and build R based apps.
  • Several general purpose server/client frameworks for R exist. Rserve and RSclient provide server and client functionality for TCP/IP or local socket interfaces. httpuv provides a low-level socket and protocol support for handling HTTP and WebSocket requests directly within R. Another related package, perhaps which httpuv replaces, is websockets. servr provides a simple HTTP server to serve files under a given directory based on httpuv.
  • Several packages offer functionality for turning R code into a web API. jug is a simple API-builder web framework, built around httpuv. FastRWeb provides some basic infrastructure for this. plumber allows you to create a REST API by decorating existing R source code.
  • The WADL package provides tools to process Web Application Description Language (WADL) documents and to programmatically generate R functions to interface to the REST methods described in those WADL documents. (not on CRAN)
  • The RDCOMServer provides a mechanism to export R objects as (D)COM objects in Windows. It can be used along with the RDCOMClient package which provides user-level access from R to other COM servers. (not on CRAN)
  • rapporter.net provides an online environment (SaaS) to host and run rapport statistical report templates in the cloud.
  • radiant (Archived on CRAN) is Shiny-based GUI for R that runs in a browser from a server or local machine.
  • neocities wraps the API for the Neocities web hosting service. (not on CRAN)
  • The Tiki Wiki CMS/Groupware framework has an R plugin (PluginR) to run R code from wiki pages, and use data from their own collected web databases (trackers). A demo: http://r.tiki.org/. More info in a useR!2013 presentation.
  • The MediaWiki has an extension (Extension:R) to run R code from wiki pages, and use uploaded data. Links to demo pages (in German) can be found at the category page for R scripts at MM-Stat. A mailing list is available: R-sig-mediawiki.
  • whisker: Implementation of logicless templating based on Mustache in R. Mustache syntax is described in http://mustache.github.io/mustache.5.html
  • CGIwithR (not on CRAN) allows one to use R scripts as CGI programs for generating dynamic Web content. HTML forms and other mechanisms to submit dynamic requests can be used to provide input to R scripts via the Web to create content that is determined within that R script.

Web Services

Cloud Computing and Storage

  • Amazon Web Services is a popular, proprietary cloud service offering a suite of computing, storage, and infrastructure tools. aws.signature provides functionality for generating AWS API request signatures.
    • Simple Storage Service (S3) is a commercial server that allows one to store content and retrieve it from any machine connected to the Internet. RAmazonS3 and s3mpi (not on CRAN) provides basic infrastructure for communicating with S3. AWS.tools (GitHub) interacts with S3 and EC2 using the AWS command line interface (an external system dependency). The CRAN version is archived. awsConnect (not on CRAN) is another package using the AWS Command Line Interface to control EC2 and S3, which is only available for Linux and Mac OS.
    • Elastic Cloud Compute (EC2) is a cloud computing service. AWS.tools and awsConnect (not on CRAN) both use the AWS command line interface to control EC2. segue (not on CRAN) is another package for managing EC2 instances and S3 storage, which includes a parallel version of lapply() for the Elastic Map Reduce (EMR) engine called emrlapply(). It uses Hadoop Streaming on Amazon's EMR in order to get simple parallel computation.
    • DBREST: RAmazonDBREST provides an interface to Amazon's Simple DB API.
    • The cloudyr project, which is currently under active development on GitHub, aims to provide a unified interface to the full Amazon Web Services suite without the need for external system dependencies.
  • Cloud Storage: googleCloudStorageR interfaces with Google Cloud Storage. boxr (GitHub) is a lightweight, high-level interface for the box.com API. rDrop2 (GitHub; not on CRAN) is a Dropbox interface that provides access to a full suite of file operations, including dir/copy/move/delete operations, account information (including quotas) and the ability to upload and download files from any Dropbox account. backblazer (GitHub) provides access to the Backblaze B2 storage API.
  • Docker: analogsea is a general purpose client for the Digital Ocean v2 API. In addition, the package includes functions to install various R tools including base R, RStudio server, and more. There's an improving interface to interact with docker on your remote droplets via this package.
  • rcrunch (not on CRAN) provides an interface to crunch.io storage and analytics.
  • rrefine (not on CRAN) provides a client for the OpenRefine (formerly Google Refine) data cleaning service.

Document and Code Sharing

  • Code Sharing: gistr (GitHub) works with GitHub gists (gist.github.com) from R, allowing you to create new gists, update gists with new files, rename files, delete files, get and delete gists, star and un-star gists, fork gists, open a gist in your default browser, get embed code for a gist, list gist commits, and get rate limit information when authenticated. git2r provides bindings to the git version control system and rgithub (not on CRAN) provides access to the GitHub.com API, both of which can facilitate code or data sharing via GitHub. gitlabr is a GitLab-specific client.
  • Data archiving: dvn (GitHub) provides access to The Dataverse Network API. rfigshare (GitHub) connects with Figshare.com. dataone provides read/write access to data and metadata from the DataONE network of Member Node data repositories. dataone (GitHub) provides a client for DataONE repositories.
  • Google Drive/Google Documents: driver (not on CRAN) is a thin client for the Google Drive API. The RGoogleDocs package is an example of using the RCurl and XML packages to quickly develop an interface to the Google Documents API. RGoogleStorage provides programmatic access to the Google Storage API. This allows R users to access and store data on Google's storage. We can upload and download content, create, list and delete folders/buckets, and set access control permissions on objects and buckets.
  • Google Sheets: googlesheets (GitHub) can access private or public Google Sheets by title, key, or URL. Extract data or edit data. Create, delete, rename, copy, upload, or download spreadsheets and worksheets. gsheet (GitHub) can download Google Sheets using just the sharing link. Spreadsheets can be downloaded as a data frame, or as plain text to parse manually.
  • imguR (GitHub) is a package to share plots using the image hosting service Imgur.com. knitr also has a function imgur_upload() to load images from literate programming documents.
  • rscribd (not on CRAN): API client for publishing documents to Scribd.

Data Analysis and Processing Services

  • Crowdsourcing: Amazon Mechanical Turk is a paid crowdsourcing platform that can be used to semi-automate tasks that are not easily automated. MTurkR (GitHub)) provides access to the Amazon Mechanical Turk Requester API. microworkers (not on CRAN) can distribute tasks and retrieve results for the Microworkers.com platform.
  • Geolocation/Geocoding: Several packages connect to geolocation/geocoding services. rgeolocate (GitHub) offers several online and offline tools. rydn (not on CRAN) is an interface to the Yahoo Developers network geolocation APIs, and ipapi (GitHub) can be used to geolocate IPv4/6 addresses and/or domain names using the ip-api.com API. threewords connects to the What3Words API, which represents every 3-meter by 3-meter square on earth as a three-word phrase. opencage (GitHub) provides access to to the OpenCage geocoding service. geoparser (GitHub) interfaces with the Geoparser.io web service to identify place names from plain text. nominatim (not on CRAN) connects to the OpenStreetMap Nominatim API for reverse geocoding. PostcodesioR (not on CRAN) provides post code lookup and geocoding for the United Kingdom.
  • Image Processing: RoogleVision (not on CRAN) links to the Google Cloud Vision image recognition service.
  • Machine Learning as a Service: Several packages provide access to cloud-based machine learning services. AzureML links with the Microsoft Azure machine learning service. bigml (GitHub) connects to BigML. ddeploy wraps the Duke Analytics model deployment API. indicoio (Archived on CRAN) connects to APIs at https://indico.io/, with wrappers for Positive/Negative Sentiment Analysis, Political Sentiment Analysis, Image Feature Extraction, Facial Emotion Recognition, Facial Feature Extraction, and Language Detection. clarifai (GitHub) is a Clarifai.com client that enables automated image description. rLTP (GitHub) accesses the ltp-cloud service. googlepredictionapi (not on CRAN): is an R client for the Google Prediction API, a suite of cloud machine learning tools. Finally, RDataCanvas (GitHub) can write a module for datacanvas.io, a big data analytics platform. yhatr lets you deploy, maintain, and invoke models via the Yhat REST API. datarobot works with Data Robot's predictive modeling platform. mscsweblm4r (GitHub) interfaces with the Microsoft Cognitive Services Web Language Model API and mscstexta4r (GitHub) uses the Microsoft Cognitive Services Text Analytics REST API. rosetteApi links to the Rosette text analysis API.
  • Machine Translation: translate provides bindings for the Google Translate API v2 and translateR provides bindings for both Google and Microsoft translation APIs. mstranslator (GitHub) provides an R wrapper for the Microsoft Translator API. RYandexTranslate (GitHub) connects to Yandex Translate. transcribeR provides automated audio transcription via the HP IDOL service.
  • Document Processing: abbyyR GitHub and captr (GitHub) connect to optical character recognition (OCR) APIs. pdftables (GitHub) uses the PDFTables.com webservice to extract tables from PDFs.
  • Mapping: osmar provides infrastructure to access OpenStreetMap data from different sources to work with the data in common R manner and to convert data into available infrastructure provided by existing R packages (e.g., into sp and igraph objects). osrm (GitHub) provides shortest paths and travel times from OpenStreetMap. osmplotr (GitHub) extracts customizable map images from OpenStreetMap. RgoogleMaps serves two purposes: it provides a comfortable R interface to query the Google server for static maps, and use the map as a background image to overlay plots within R. R2GoogleMaps provides a mechanism to generate JavaScript code from R that displays data using Google Maps. placement (GitHub) provides drive time and geolocation services from Google Maps. RKMLDevice allows to create R graphics in Keyhole Markup Language (KML) format in a manner that allows them to be displayed on Google Earth (or Google Maps), and RKML provides users with high-level facilities to generate KML. plotKML can visualization spatial and spatio-temporal objects in Google Earth. plotGoogleMaps pls SP or SPT (STDIF,STFDF) data as an HTML map mashup over Google Maps. ggmap allows for the easy visualization of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps using ggplot2. leafletR: Allows you to display your spatial data on interactive web-maps using the open-source JavaScript library Leaflet. CartoDB (not on CRAN) provides an API interface to Cartodb.com. openadds (GitHub) is an Openaddresses client.
  • Online Surveys: qualtrics (not on CRAN) provides functions to interact with Qualtrics. WufooR (GitHub) can retrieve data from Wufoo.com forms. redcapAPI (GitHub) can provide access to data stored in a REDCap (Research Electronic Data CAPture) database, which is a web application for building and managing online surveys and databases developed at Vanderbilt University. formr facilitates use of the formr survey framework, which is built on openCPU. Rexperigen is a client for the Experigen experimental platform.
  • Visualization: Plot.ly is a company that allows you to create visualizations in the web using R (and Python). They have an R package in development here (not on CRAN), as well as access to their services via a REST API. googleVis provides an interface between R and the Google chart tools. The RUbigraph package provides an R interface to a Ubigraph server for drawing interactive, dynamic graphs. You can add and remove vertices/nodes and edges in a graph and change their attributes/characteristics such as shape, color, size. Interactive, Javascript-enabled graphics are an increasingly useful output format for data analysis. ggvis makes it easy to describe interactive web graphics in R. It fuses the ideas of ggplot2 and shiny, rendering graphics on the web with Vega. d3Network provides tools for creating D3 JavaScript network, tree, dendrogram, and Sankey graphs from R. rCharts (not on CRAN) and clickme (not on CRAN) allow for interactive Javascript charts from R. animint (not on CRAN) allows an interactive animation to be defined using a list of ggplots with clickSelects and showSelected aesthetics, then exported to CSV/JSON/D3/JavaScript for viewing in a web browser. rVega (not on CRAN) is an R wrapper for Vega.
  • Other:

Social Media Clients

  • plusser has been designed to to facilitate the retrieval of Google+ profiles, pages and posts. It also provides search facilities. Currently a Google+ API key is required for accessing Google+ data.
  • Rfacebook provides an interface to the Facebook API.
  • The Rflickr package provides an interface to the Flickr photo management and sharing application Web service. (not on CRAN)
  • instaR (GitHub) is a client for the Instagram API.
  • Rlinkedin (not on CRAN) is a client for the LinkedIn API. Auth is via OAuth.
  • SocialMediaMineR is an analytic tool that returns information about the popularity of a URL on social media sites.
  • tumblR (GitHub) is a client for the Tumblr API (https://www.tumblr.com/docs/en/api/v2). Tumblr is a microblogging platform and social networking website https://www.tumblr.com/.
  • Twitter: twitteR provides an interface to the Twitter web API. RTwitterAPI (not on CRAN) and rtweet (not on CRAN) are other Twitter clients. twitterreport (not on CRAN) focuses on report generation based on Twitter data. streamR provides a series of functions that allow users to access Twitter's filter, sample, and user streams, and to parse the output into data frames. OAuth authentication is supported. tweet2r is an alternative iplementation geared toward SQLite and postGIS databases. graphTweets produces a network graph from a data.frame of tweets. tweetscores (not on CRAN) implements a political ideology scaling measure for specified Twitter users.

Web Analytics Services

  • Google Trends: GTrendsR (not on CRAN) offers functions to perform and display Google Trends queries. Another GitHub package (rGtrends) is now deprecated, but supported a previous version of Google Trends and may still be useful for developers. RGoogleTrends provides another alternative.
  • Google Analytics: googleAnalyticsR (GitHub), RGoogleAnalytics (GitHub), ganalytics (GitHub; not on CRAN), GAR (GitHub), and RGA provide functions for accessing and retrieving data from the Google Analytics APIs. The latter supports OAuth 2.0 authorization. RGA provides a shiny app to explore data. searchConsoleR (GitHub) links to the Google Search Console (formerly Webmaster Tools).
  • Online Advertising: fbRads can manage Facebook ads via the Facebook Marketing API. RDoubleClick (not on CRAN) can retrieve data from Google's DoubleClick Campaign Manager Reporting API. RSmartlyIO (GitHub) loads Facebook and Instagram advertising data provided by Smartly.io.
  • Other services: RSiteCatalyst has functions for accessing the Adobe Analytics (Omniture SiteCatalyst) Reporting API.
  • RAdwords (GitHub) is a package for loading Google Adwords data.
  • webreadr (GitHub) can process various common forms of request log, including the Common and Combined Web Log formats and AWS logs.
  • ApacheLogProcessor (GitHub) can process Apache Web Server log files.
  • RMixpanel provides an interface to many endpoints of Mixpanel's Data Export, Engage and JQL API.

Other Web Services

  • Fitness Apps: fitbitScraper (GitHub) retrieves Fitbit data. RGoogleFit provides similar functionality for Google Fit.

  • Push Notifications: RPushbullet provides an easy-to-use interface for the Pushbullet service which provides fast and efficient notifications between computers, phones and tablets. pushoverr (GitHub) can sending push notifications to mobile devices (iOS and Android) and desktop using Pushover.
  • Reference/bibliography/citation management: RefManageR imports and manage BibTeX and BibLaTeX references with RefManager. RMendeley: Implementation of the Mendeley API in R. Archived on CRAN. It's been archived on CRAN temporarily until it is updated for the new Mendeley API. rmetadata (not on CRAN) can get scholarly metadata from around the web. rorcid (GitHub) is a programmatic interface the Orcid.org API, which can be used for identifying scientific authors and their publications (e.g., by DOI). rplos is a programmatic interface to the Web Service methods provided by the Public Library of Science journals for search. rpubmed (not on CRAN) provides tools for extracting and processing Pubmed and Pubmed Central records, and europepmc (GitHub) connects to the Europe PubMed Central service. scholar provides functions to extract citation data from Google Scholar. Convenience functions are also provided for comparing multiple scholars and predicting future h-index values. pubmed.mineR is a package for text mining of PubMed Abstracts that supports fetching text and XML from PubMed. rdatacite (GitHub) connects to DataCite. oai (GitHub) and OAIHarvester harvest metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. JSTORr (Not on CRAN) provides simple text mining of journal articles from JSTOR's Data for Research service. aRxiv (GitHub) is a client for the arXiv API, a repository of electronic preprints for computer science, mathematics, physics, quantitative biology, quantitative finance, and statistics.
  • Wikipedia: WikipediR (GitHub) is a wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia. rwikidata and WikidataR (GitHub) can request data from Wikidata.org, the free knowledgebase. wikipediatrend (GitHub) provides access to Wikipedia page access statistics. WikiSocio can retrieve contributor lists and revision data.
  • bigrquery (GitHub): An interface to Google's bigquery.
  • cymruservices queries Team Cymru web security services.
  • datamart: Provides an S4 infrastructure for unified handling of internal datasets and web based data sources. Examples include dbpedia, eurostat and sourceforge.
  • discgolf (GitHub) provides a client to interact with the API for the Discourse web forum platform. The API is for an installed instance of Discourse, not for the Discourse site itself.
  • factualR: Thin wrapper for the Factual.com server API.
  • GFusionTables (not on CRAN): An interface to Google Fusion Tables. Google Fusion Tables is a data management system in the cloud. This package provides functions to browse Fusion Tables catalog, retrieve data from Gusion Tables dtd storage to R and to upload data from R to Fusion Tables
  • HIBPwned (not on CRAN) is a client for Have I Been Pwned.
  • infochimps (GitHub; archived) is an R wrapper for the infochimps.com API services.
  • internetarchive (not on CRAN): API client for internet archive metadata.
  • jSonarR: Enables users to access MongoDB by running queries and returning their results in data.frames. jSonarR uses data processing and conversion capabilities in the jSonar Analytics Platform and the JSON Studio Gateway, to convert JSON to a tabular format.
  • livechatR is a client for the LiveChat API.
  • lucr performs currency conversions using Open Exchange Rates.
  • mockaRoo (not on CRAN) uses the MockaRoo API to generate mock or fake data based on an input schema.
  • osi (GitHub) retrieves open source license data and metadata from https://api.opensource.org/licenses/.
  • randNames (GitHub) generates random names and personal identifying information using the https://randomapi.com/ API.
  • Rbitcoin allows both public and private API calls to interact with Bitcoin. rbitcoinchartsapi is a package for the BitCoinCharts.com API. From their website: "Bitcoincharts provides financial and technical data related to the Bitcoin network and this data can be accessed via a JSON application programming interface (API).".
  • Rblpapi (GitHub) is a client for Bloomberg Finance L.P. ROpenFIGI (GitHub) provides an interface to Bloomberg's OpenFIGI API.
  • rerddap (GitHub; not on CRAN): A generic R client to interact with any ERDDAP instance, which is a special case of OPeNDAP (https://en.wikipedia.org/wiki/OPeNDAP), or Open-source Project for a Network Data Access Protocol. Allows user to swap out the base URL to use any ERDDAP instance.
  • ripplerestr provides an interface to the Ripple protocol for making financial transactions.
  • restimizeapi provides an interface to trading website estimize.com.
  • RForcecom: RForcecom provides a connection to Force.com and Salesforce.com.
  • Rgoodreads (not on CRAN) interacts with Goodreads.
  • RLastFM (archived on CRAN) is a package to interface to the last.fm API. Archived on CRAN.
  • ROpenWeatherMap is a client for location-based weather data and forecasting from Open Weather Map.
  • RSocrata access data for Socrata open data portals. soql is a pipe-oriented set of tools for constructing Socrata queries.
  • RStripe provides an interface to Stripe, an online payment processor.
  • RZabbix links with the Zabbix network monitoring service API.
  • shopifyr: An interface to the API of the E-commerce service Shopify https://help.shopify.com/api.
  • slackr (GitHub) is a client for Slack.com messaging platform.
  • SlideShaRe (not on CRAN) is a client for Slideshare.
  • stackr (not on CRAN): An unofficial wrapper for the read-only features of the Stack Exchange API.
  • telegram (GitHub) connects with the Telegram Bot API.
  • tuber (not on CRAN): A YouTube API client.
  • udapi (not on CRAN) connects to Urban Dictionary.
  • yummlyr (GitHub) provides an interface to the Yummly recipe database.
  • zendeskR: This package provides a wrapper for the Zendesk API.
  • ZillowR is a client for the Zillow real estate service.

View on CRAN

3 months ago

Thomas Leeper, Scott Chamberlain, Patrick Mair, Karthik Ram, Christopher Gandrud

Packages

3
25
27
39
42
47
53
83
91
93
134
137
160
201
359
365
388
416
448
458
477
491
499
505
508
514
584
616
693
742
822
832
859
904
912
923
980
1003
1006
1008
1065
1120
1128
1185
1214
1218
1375
1381
1383
1386
1582
1593
1790
1915
1919
1925
2000
2053
2078
2098
2100
2184
2189
2248
2376
2444
2469
2486
2524
2570
2654
2920
3009
3098
3156
3286
3419
3437
3583
3725
3733
3769
3770
3781
3887
4009
4063
4188
4238
4444
4524
4629
4696
4831
4833
4856
4997
5011
5035
5066
5102
5308
5343
5360
5501
5567
5703
5957
6344
6395
6482
6561
6791
6805
6872
6920
7010
7181
7278
7317
7357
7478
7530
7868
7886
7937
7983
8066
8112
8207
8286
8335
8361
8397
9140
9351
11506

Discussions