TaxNorm Introduction


This document introduces the TaxNorm R package, a package for normalizing microbiome taxa data. Here, we will go through how to install, analyze and visualize microbiome data using this package. TaxNorm implements the Zero Inflated Negative Binomial (ZINB) method to normalize microbiome data.

What is the ZINB method?


There are three main steps in using this package:


Required Packages

TaxaNorm requires the packages phyloeq and microbiome which can be found on bioconductor.

Installation from Bioconductor

Installation from Github

For the newest, but potentially unstable, version of the package, direct github installation is also supported.


Loading Package into R Environment

# library(phyloseq)
# library(microbiome)
# library(ggplot2)
# library(vegan)
# library(MASS)

Example Usage

Basic Useage

data("TaxaNorm_Example_Input", package = "TaxaNorm")

# run normalization
TaxaNorm_Example_Output <- TaxaNorm_Normalization(data= TaxaNorm_Example_Input, 
                                         depth = NULL, 
                                         group = sample_data(TaxaNorm_Example_Input)$body_site, 
                                = NULL,
                                  filter.cell.num = 10,
                                  filter.taxa.count = 0,
                                  random = FALSE,
                                  ncores = 1)

# run diagnosis test
Diagnose_Data <- TaxaNorm_Run_Diagnose(Normalized_Results = TaxaNorm_Example_Output, prev = TRUE, equiv = TRUE, group = sample_data(TaxaNorm_Example_Input)$body_site)

Load Input Data

Built in example data as a phyloseq object can be loaded with the command below.

data("TaxaNorm_Example_Input", package = "TaxaNorm")

Pre-process Input Data

We have prepared several QC figures for the input data characters, which give a preliminary criteria on pre-filtering rare taxa with low information before any analysis. This will improve the power and computational efficiency for the algorithm. If the user already has the cleaned data or pre-processed the data by themselves before, they can ignore and skip this step.

qc_data <- TaxaNorm_QC_Input(TaxaNorm_Example_Input)

Here we provide a popular option to ensure at least filter.sample.num samples with a count of filter.taxa.count or more, where filter.sample.num can be chosen as any arbitrary value or the sample size of the smallest group of samples. By default, we used filter.taxa.count=1 and filter.sample.num=10. This criteria is incorporated in the following main function TaxNorm_Normalization() as well.

filter.sample.num =1
filter.taxa.count = 10
taxaIn <- rowSums(abundances(TaxaNorm_Example_Input) > filter.taxa.count) > filter.sample.num
TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input) 

Users can apply any of their customized filtering criteria as well. Alternatively, a basic pre-filtering is to keep only rows that have at least 10 reads total:

taxaIn <- rowSums(abundances(TaxaNorm_Example_Input)) > 10
TaxaNorm_Example_Input <- prune_taxa(taxaIn, TaxaNorm_Example_Input) 

QC Input Data

qc_data <- TaxNorm_QC_Input(TaxaNorm_Example_Input)

Run Normalization

The normalization is run and returns a TaxaNorm_Results object. This object contains the input data, raw data, normdata, ecdf, model parameters, and convergence.

#Pick group from phyloseq object
group <- sample_data(TaxaNorm_Example_Input)$body_site
#Run Normalization function
Normalized_Data <- TaxaNorm_Normalization(data = TaxaNorm_Example_Input, 
                     depth = NULL,
                     group = group, 
                     filter.taxa.count = 0,
                     random = TRUE,
                     ncores = 1)

QC TaxNorm Model

data("TaxaNorm_Example_Output", package = "TaxaNorm")

TaxaNorm_Model_QC(TaxaNormResults = TaxaNorm_Example_Output)

TaxNorm NMDS

TaxaNorm_NMDS(TaxaNormResults = TaxaNorm_Example_Output, group_column = "body_site")