arrow 2.0.0

Datasets

AWS S3 support

Flight RPC

Flight is a general-purpose client-server framework for high performance transport of large datasets over network interfaces. The arrow R package now provides methods for connecting to Flight RPC servers to send and receive data. See vignette("flight", package = "arrow") for an overview.

Computation

Packaging and installation

Bug fixes and other enhancements

arrow 1.0.1

Bug fixes

arrow 1.0.0

Arrow format conversion

Datasets

Other enhancements

Bug fixes and deprecations

Installation and packaging

arrow 0.17.1

arrow 0.17.0

Feather v2

This release includes support for version 2 of the Feather file format. Feather v2 features full support for all Arrow data types, fixes the 2GB per-column limitation for large amounts of string data, and it allows files to be compressed using either lz4 or zstd. write_feather() can write either version 2 or version 1 Feather files, and read_feather() automatically detects which file version it is reading.

Related to this change, several functions around reading and writing data have been reworked. read_ipc_stream() and write_ipc_stream() have been added to facilitate writing data to the Arrow IPC stream format, which is slightly different from the IPC file format (Feather v2 is the IPC file format).

Behavior has been standardized: all read_<format>() return an R data.frame (default) or a Table if the argument as_data_frame = FALSE; all write_<format>() functions return the data object, invisibly. To facilitate some workflows, a special write_to_raw() function is added to wrap write_ipc_stream() and return the raw vector containing the buffer that was written.

To achieve this standardization, read_table(), read_record_batch(), read_arrow(), and write_arrow() have been deprecated.

Python interoperability

The 0.17 Apache Arrow release includes a C data interface that allows exchanging Arrow data in-process at the C level without copying and without libraries having a build or runtime dependency on each other. This enables us to use reticulate to share data between R and Python (pyarrow) efficiently.

See vignette("python", package = "arrow") for details.

Datasets

Installation

Other bug fixes and enhancements

arrow 0.16.0.2

arrow 0.16.0

Multi-file datasets

This release includes a dplyr interface to Arrow Datasets, which let you work efficiently with large, multi-file datasets as a single entity. Explore a directory of data files with open_dataset() and then use dplyr methods to select(), filter(), etc. Work will be done where possible in Arrow memory. When necessary, data is pulled into R for further computation. dplyr methods are conditionally loaded if you have dplyr available; it is not a hard dependency.

See vignette("dataset", package = "arrow") for details.

Linux installation

A source package installation (as from CRAN) will now handle its C++ dependencies automatically. For common Linux distributions and versions, installation will retrieve a prebuilt static C++ library for inclusion in the package; where this binary is not available, the package executes a bundled script that should build the Arrow C++ library with no system dependencies beyond what R requires.

See vignette("install", package = "arrow") for details.

Data exploration

Compression

Other fixes and improvements

arrow 0.15.1

arrow 0.15.0

Breaking changes

New features

Other upgrades

arrow 0.14.1

Initial CRAN release of the arrow package. Key features include: