avidaR: an R library for performing complex queries on digital organisms stored in avidaDB.

pipeline status coverage report Version on CRAN Total downloads on CRAN

Introduction.

avidaR was born as part of a project focused on providing semantics to data resulting from experiments carried out in an Artificial Life software platform (Avida) for studying the evolution of self-replicating computer programs, known as digital organisms. This package provides users of the R programming language with an easy-to-use tools for performing complex queries on avidaDB, a semantic database that stores genomes and transcriptomes of more than a million digital organisms. The avidaR library was developed by the Computational Biology Lab of the Doñana Biological Station (EBD), a research institute of the Spanish National Research Council (CSIC) based in Seville (Spain).

Installation.

avidaR depends on the following packages: - base64enc - xml2 - httr - dplyr - readr - tidyr - tibble - circlize - RColorBrewer - R6 - devtools: needed for the GitLab installation.

Please, first install those packages with install.packages. Then, install avidaR from GitLab using the following command:

devtools::install_gitlab("fortunalab/avidaR@main")

Usage.

avidaR is loaded as follows:

library(avidaR)

Connect to avidaDB.

avidaDB is a semantic database (or triple-store) on genomes and transcriptomes of more a million digital organisms stored as RDF data. It allows querying data using the SPARQL query language. The library avidaR can connect to triple-stores that support the RDF4J server REST API such as GraphDB. Since avidaDB is implemented in GraphDB, a basic connection (requiring no password or requiring basic HTTP user-pass authentication) or a connection secured with an API access token can be established.

Once avidaR is loaded, it points by default to a small subset of the database (avidaDB_test) for testing purposes (show the current access options by running triplestore$access_options()). Please, change the name of the repository from avidaDB_test to avidaDB to get access to the entire database by running:

triplestore$set_access_options(
    url = "https://graphdb.fortunalab.org",
    user = "public_avida",
    password = "public_avida",
    repository = "avidaDB"
  )

Get data from avidaDB.

The following function can be used to get the genome sequence of a single genome (e.g., genome_id = 1):

get_genome_seq_from_genome_id(genome_id = 1)

or to get the genome sequences of multiple genomes at once:

get_genome_seq_from_genome_id(genome_id = c(1, 2, 3))

Please, use the R help command to get more details about any specific function by writing the name of the function preceded by the symbol ?:

?get_genome_seq_from_genome_id

List of available functions grouped by the target entity:

Get the genome of a digital organism:

Get the phenotype encoded by the genome of a digital organism:

Get the logic operations (i.e., traits) defining the phenotype of a digital organism:

Get the transcriptome executed by a digital organism:

Get the tandem repeat contained in the transcriptome of a digital organism:

Miscellaneous functions:

Source code

avidaR was developed by Raúl Ortega.