Affinity purification-mass spectrometry is one of the most common techniques for the analysis of protein-protein interactions, but inferring bona fide interactions from the resulting data sets remains notoriously difficult. We introduce SFINX, a Straightforward Filtering INdeX that identifies true-positive protein interactions in a fast, user-friendly, and highly accurate way. SFINX outperforms alternative techniques on benchmark data sets and is also available via the Web interface at http://sfinx.ugent.be/.

## Context

The analysis of protein-protein interactions enables scientists to connect genotypes with phenotypes and to answer fundamental biological questions or generate new hypotheses on the functions of proteins. In this field, affinity purification-mass spectrometry is a classical approach wherein a protein of interest (bait) containing an epitope tag is purified under conditions that preserve the protein complex to allow the identification of co-purifying proteins by mass spectrometry.

Several software approaches already exist to separate the false-positives from the true-positives in these protein-protein interaction data sets, but none of these approaches combines high accuracy, speed and user-friendliness without the need for the input of external data. Therefore, we developed the Straightforward Filtering INdeX (SFINX), which excels at all these points.

## Access

Users can easily access SFINX via the Web site interface at http://sfinx.ugent.be/ or via this package. This package also allows users to more easily integrate SFINX in their own R pipelines or on their own servers.

## Examples

To use the sfinx package that you installed in your library, you also have to load it as follows:

library(sfinx)

In these examples, we will use two example files as included in the package. These example files contain the original TIP49 data that were also included in the website interface. The first file (DataInputExampleFile) contains a numerical matrix with the results of the co-complex interactomics experiments. The columns of the matrix correspond to the individual experiments, and the rows correspond to the proteins that were at least once detected in one of the experiments. Hence, the rownames also need to be unique. The cells of this matrix are filled with the corresponding peptide counts of each protein in each experiment. The first ten rows and the first five columns of this matrix look as follows:

DataInputExampleFile[1:10,1:5]
#>                              ARP5 ARP6_1 ARP6_2 ARP6_3 ARP8_1
#> gi|10440560|ref|NP_066298.1|    0      1      0      0      0
#> gi|10716563|ref|NP_001737.1|    0      2      0      0      0
#> gi|10800130|ref|NP_066409.1|    0      2      2      1      0
#> gi|10800138|ref|NP_066407.1|    0      0      0      0      0
#> gi|10801345|ref|NP_037366.1|    0      1      0      0      0
#> gi|10834990|ref|NP_000606.1|    0      1      0      0      0
#> gi|10835051|ref|NP_001742.1|    0      0      0      0      0
#> gi|10835055|ref|NP_001778.1|    0      0      0      0      0
#> gi|10835063|ref|NP_002511.1|    3      5      3      3      0
#> gi|10835067|ref|NP_003133.1|    0     10      0      0      0

The second file (BaitIdentityExampleFile) is a character vector that contains all the (bait) proteins of interest. These protein names should match the protein names in the rows of the first file exactly. If the proteins from the the second file cannot be found in the first file, these proteins will be discarded from the analysis and SFINX will warn the user about this. The example vector looks as follows:

BaitIdentityExampleFile
#>  [1] "gi|31542680|ref|NP_079131.2|" "gi|11968057|ref|NP_071941.1|"
#>  [3] "gi|39812115|ref|NP_075050.3|" "gi|24308444|ref|NP_612467.1|"
#>  [5] "gi|39930355|ref|NP_056263.1|" "gi|27734727|ref|NP_775889.1|"
#>  [7] "gi|20149643|ref|NP_060423.2|" "gi|8923598|ref|NP_060386.1|"
#>  [9] "gi|46367785|ref|NP_060292.2|" "gi|38488718|ref|NP_060229.2|"
#> [11] "gi|4504255|ref|NP_002097.1|"  "gi|13775202|ref|NP_112578.1|"
#> [13] "gi|42822884|ref|NP_919257.2|" "gi|32996737|ref|NP_775106.2|"
#> [15] "gi|8922764|ref|NP_060740.1|"  "gi|6912542|ref|NP_036477.1|"
#> [17] "gi|18079254|ref|NP_110442.1|" "gi|19924159|ref|NP_003787.2|"
#> [19] "gi|HsSRCAP"                   "gi|7019371|ref|NP_037474.1|"
#> [21] "gi|4506753|ref|NP_003698.1|"  "gi|5730023|ref|NP_006657.1|"
#> [23] "gi|24041018|ref|NP_705582.1|" "gi|5174715|ref|NP_005988.1|"
#> [25] "gi|5453617|ref|NP_006340.1|"  "gi|7656936|ref|NP_055020.1|"

You can perform the standard SFINX analysis by using the sfinx() function of the sfinx package.

sfinx(DataInputExampleFile, BaitIdentityExampleFile)

The output of the sfinx() function is a list with two elements. The first element is a dataframe with the filtered baits and preys, and the associated SFINX scores: the lower the SFINX score, the stronger the certainty of interaction. Beneath, you see the first rows of this output dataframe.

head(sfinx(DataInputExampleFile, BaitIdentityExampleFile)[[1]])
#>                                                     Baits       Scores
#> gi|4504041|ref|NP_002061.1|  gi|11968057|ref|NP_071941.1| 2.521500e-08
#> gi|19424130|ref|NP_598000.1| gi|11968057|ref|NP_071941.1| 1.365428e-09
#> gi|NTAP-BHD_mutant           gi|11968057|ref|NP_071941.1| 1.365428e-09
#> gi|18105007|ref|NP_004332.2| gi|11968057|ref|NP_071941.1| 7.510240e-11
#> gi|74136549|ref|NP_115575.1| gi|11968057|ref|NP_071941.1| 7.393988e-11
#> gi|36287069|ref|NP_874369.1| gi|11968057|ref|NP_071941.1| 6.192308e-12
#>                                                     Preys      pValues
#> gi|4504041|ref|NP_002061.1|   gi|4504041|ref|NP_002061.1| 2.521500e-08
#> gi|19424130|ref|NP_598000.1| gi|19424130|ref|NP_598000.1| 1.365428e-09
#> gi|NTAP-BHD_mutant                     gi|NTAP-BHD_mutant 1.365428e-09
#> gi|18105007|ref|NP_004332.2| gi|18105007|ref|NP_004332.2| 8.239662e-11
#> gi|74136549|ref|NP_115575.1| gi|74136549|ref|NP_115575.1| 7.393988e-11
#> gi|36287069|ref|NP_874369.1| gi|36287069|ref|NP_874369.1| 7.347004e-12

The second element gives extra information about the experiment and possible warnings.

sfinx(DataInputExampleFile, BaitIdentityExampleFile)[[2]]
#> [1] "All baits were found as possible preys. Some baits yielded interactions with a lower confidence, as there are not enough negative controls for them:gi|4506753|ref|NP_003698.1| gi|5730023|ref|NP_006657.1|. Please, use more negative controls."

As you can see, all the bait proteins in the example vector (BaitIdentityExampleFile) were also found as proteins in the rows of the example matrix (DataInputExampleFile), but the analysis of two bait proteins needs more negative controls for correct trustworthiness of the results associated with these baits.

You can also use the more advanced version of SFINX. The parameters and their standard settings can be found below.

sfinx(InputData, BaitVector, BackgroundRatio = 5, BackgroundIdentity = "automatic", BaitInfluence = FALSE, ConstantLimit = TRUE, FWERType = "B")

We direct users to the help files of the sfinx() function to get more insight into the use of these parameters.

You can access the documentation in one of the two following ways:

?sfinx

help(sfinx)