# binneR

## Introduction

The binneR package provides a spectral binning approach for routine processing of flow infusion electrospray - high resolution mass spectrometry (FIE-HRMS) metabolomics fingerprinting experiments, the results of which can then be used for subsequent statistical analyses.

Spectral binning rounds high resolution fingerprinting data by a specified amu bin width. FIE-HRMS data consists of a ‘plug flow’, across which MS signal intensities can be averaged to provide a metabolome fingerprint. Below shows an animation of the spectrum change across ‘plug flow’ region of an example FIE-HRMS injection acquired in negative ionisation mode.

Spectral binning is applied on a scan by scan basis where the data is rounded to the specified bin width, the signals are then sum aggregated and their intensities are averaged across the specified scans.

Prior to the use of binneR, vendor specific raw data files need to be converted to one of the open source file formats such as .mzXML or .mzML so that they can be parsed into R. Data should also be centroided to reduce bin splitting artifacts that profile data can introduce during spectral binning. The msconvert tool can be used for both data conversion and centroiding, allowing the use of vendor specific algorithms.

There are two main functionalities provided by this package.

• Simple intensity matrix production - quick FIE-HRMS matrix investigations.
• binneRlyse - processing for routine metabolomics fingerprinting experiments.

The subsequent sections will outline the use of these two main functionalities.

Before we begin, the necessary packages need to be loaded.

library(binneR)
library(metaboData)

## Infusion Scan Detection

In order to apply the spectral binning approach for FIE-HRMS data, the infusion scans need to be detected. For a set of specified file paths, the range of infusion scans can be detected using the following:

infusionScans <- detectInfusionScans(
metaboData::filePaths('FIE-HRMS','BdistachyonEcotypes')[1],
sranges = list(c(70,1000)),
thresh = 0.5
)
infusionScans
## [1]  5  6  7  8  9 10 11 12 13

The detected scans can then be checked by plotting an averaged chromatogram for these files. The infusion scans can also be plotted by supplying the range to the scans argument.

plotChromFromFile(
metaboData::filePaths('FIE-HRMS','BdistachyonEcotypes')[1],
sranges = list(c(70,1000)),
scans = infusionScans
)

## Simple Intensity Matrix Production - quick FIE-HRMS matrix investigations

The simplest funtionality of binneR is to read raw data vector of specified file paths, bin these to a given amu and aggregate across a given scan window. This can be useful for a quick assessment of FIE-HRMS data structures. Spectral binning can be performed using the readFiles() function as shown below. The example file within the package can be specified using the following.

file <- metaboData::filePaths('FIE-HRMS','BdistachyonEcotypes')[1]

Then the data can be spectrally binned using:

res <- readFiles(file,
dp = 2,
scans = infusionScans,
sranges = list(c(50, 1000)),
modes = c("n","p"),
nCores = 1)

This will return a list containing the intensity matrices for each ionisation mode, with the rows being the individual samples and columns the spectral bins.

str(res)
## List of 2
##  $n: num [1, 1:1739] 29595 14836 9305 8681 9133 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : NULL
##   .. ..$: chr [1:1739] "n62.98" "n62.99" "n63" "n63.01" ... ##$ p: num [1, 1:1942] 465.5 62.9 27.7 25.3 192.5 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$: NULL ## .. ..$ : chr [1:1942] "p55" "p55.02" "p55.03" "p55.04" ...

## binneRlyse - metabolomics fingerprinting experiments

Routine FIE-HRMS metabolomic fingerprinting experiments can require rapid the processing of hundereds of MS files that will also require sample information such as biological classes for subsequent statistical analyses. The package allows for a Binalysis that formalises the spectral binning approach using an S4 class that not only bins the data to 0.01 amu but will also extract accurate m/z for each of these bins based on 0.00001 amu binned data. The accurate m/z data can be aggregated based on a specified class structure from which the modal accurate m/z is extracted. Some bin measures are also computed that allow the assessment of the quality of the 0.01 amu bins.

Subsequent analyses of these data can easily be applied using the metabolyseR package. The metaboWorkflows package also provides customisable wrapper workflows for high resolution FIE-MS analyses.

The example data used here is from the metaboData package and consists of a comparison of leaf tissue from four B. distachyon ecotypes.

### Basic Usage

There are two main functions for processing experimental data:

• binParameters() - allows the selection of processing parameters.
• binneRlyse() - input data file paths and sample information to process using the selected parameters.

#### Sample information

binneRlyse() requires the provision of sample information (info) for the experimental run to be processed. This should be in csv format and the recommended column headers include:

• fileOrder - the file order in alphabetical order as returned by list.files()
• injOrder - the injection order of the samples during FIE-HRMS analysis
• fileName - the sample file name
• batch - the sample batch
• block - the randomised block of the sample
• name - the sample name
• class - the sample class

The row orders of the info file should match the order in which the files paths are submitted to the binneRlyse() processing function.

#### Parameters

Prior to spectral binning the processing parameters first need to be selected. The default parameters can be initialised a BinParameters object using the binParameters() function as shown below.

binParameters()
##
## Scans: 5:12
## Modes: n, p
## Scan Ranges: 70:1000
## No. Cores: 8
## Cluster Type: FORK

These parameters specify the following:

• scans - the scan indexes to use for binning
• modes - the scan order and names of the ionisation modes
• sranges - a list of vectors containing minimum and maximum ranges for the scan events present
• cls - the column of the info that contains class information if relevant
• nCores - the number of cores to use for parallelisation
• clusterType - the cluster type to use for parallelisation

Parameters can be altered upon initialisation of the BinParameters by specifying the parameter and it’s value upon calling the binParameters function as shown below.

binParameters(scans = 6:14)
##
## Scans: 6:14
## Modes: n, p
## Scan Ranges: 70:1000
## No. Cores: 8
## Cluster Type: FORK

Alternatively for and already initialised BinParameters object, the slot containing the parameter of interest can be changed by directly accessing the slot as shown below.

parameters <- binParameters()
parameters@scans <- 6:14
parameters
##
## Scans: 6:14
## Modes: n, p
## Scan Ranges: 70:1000
## No. Cores: 8
## Cluster Type: FORK

#### Processing

Processing is simple and requires only the use of the binneRlyse() function. The input of this function is a vector of the paths of the data files to process, a tibble containing the sample info and BinParameters object. Below shows the files and info inputs for the example data set.

files <-  metaboData::filePaths('FIE-HRMS','BdistachyonEcotypes')

info <- metaboData::runinfo('FIE-HRMS','BdistachyonEcotypes')
## Parsed with column specification:
## cols(
##   fileOrder = col_double(),
##   injOrder = col_double(),
##   fileName = col_character(),
##   batch = col_double(),
##   block = col_double(),
##   name = col_character(),
##   class = col_character()
## )

It is crucial that the positions of the sample information in the info file match the sample positions within the files vector. Below shows an example of how this can be checked by matching the file names present in the info with those in the vector.

TRUE %in% (info\$fileName != basename(files))
## [1] FALSE

Spectral binning can then be performed with the following.

analysis <- binneRlyse(files,info,binParameters(scans = detectInfusionScans(files),cls = 'class'))
##
##  binneR v2.0.11 Tue Mar 19 14:48:27 2019
## _______________________________________________________
##
## Scans: 5:13
## Modes: n, p
## Scan Ranges: 70:1000
## Class: class
## No. Cores: 2
## Cluster Type: PSOCK
## _______________________________________________________
##
## Completed! [1M 25.1S]
##
## Tue Mar 19 14:49:52 2019
## Samples: 68
## n: 7640 features
## p: 8021 features
## Average Purity: 0.935
## Average Centrality: 0.527

For data quality inspection, the infusion profiles this data can be plotted using:

plotChromatogram(analysis)

The spectrum fingerprints using:

plotFingerprint(analysis)

And the total ion counts using:

plotTIC(analysis)

Density profiles for individual bins can be plotted by:

plotBin(analysis,'n133.01',cls = TRUE)

#### Data Extraction

There are a number of functions that can be used to return processing data from a Binalysis object:

• info() for sample information
• binnedData() for the spectrally binned matrices
• accurateData() for the accurate mass information for each of the 0.01 amu bins

### Bin Measures

There are a number of measures that can be computed that allow the assessment of the quality of a given 0.01 amu bin in terms of the accurate m/z peaks present within its boundaries. These include both purity and centrality.

#### Purity

Bin purity gives a measure of the spread of accurate m/z peaks found within a given bin and can be a signal for the presences of multiple real spectral peaks within a bin. When the total ion count (t) for a given bin is greater than 1, purity is calculated by

$p = 1 - \frac{\sigma}{w}$

Where p is purity, $$\sigma$$ is the standard deviation of the accurate m/z present within the bin and w is the width of the bin in amu. Else, when $$t = 1$$, p is also equal to 1. A purity closer to 1 indicates that the accurate m/z present within a bin are found over a narrow region and therefore likely only to be as the result of 1 real mass spectral peak. A reduction in purity could indicate the presence of multiple peaks present within a bin.

Below shows example density plots of two negative ionisation mode 0.01 amu bins showing high (n133.01) and low (n98.96) purity respectively.

Bin n133.01, that has a purity very close to 1, has only one peak present. Bin n98.96, that has a reduced purity, clearly has two peaks present.

#### Centrality

Bin centrality gives a measure of how close the mean of the accurate m/z are to the center of a given bin and can give indication of whether a peak could have been split between the boundary of tow adjacent bins. Centrality is calculated for a given bin using the equation below.

$c = 1 - \frac{|\mu - k|}{\frac{1}{2}w}$

Where c is centrality, $$\mu$$ is the mean accurate m/z present in the bin, k is the center of the bin and w is the bin width in amu. A centrality close to 1 indicates that the accurate m/z present within the boundaries of the bin are located close to the center of the bin. Low centrality would indicate that the accurate m/z present within the bin are found close to the bin boundary and could therefore indicate bin splitting, were an mass spectral peak is split between two adjacent bins.

Below shows example density plots of two negative ionisation mode 0.01 amu bins showing high (n88.04) and low (n104.03) centrality respectively.

Bin n88.04 has a high centrality with single peak that is located very close to the center of the bin. Whereas bin n104.03 as low centrality with a single peak that is located very close to the upper boundary of the bin and could indicate that it has been split between this bin and bin n104.04.