prepareDatabaseFiles.RdPrepare Gesel database files from various pieces of gene set information.
prepareDatabaseFiles(
species,
collections,
set.info,
set.membership,
num.genes,
path = ".",
validate = TRUE
)String specifying the species in the form of its NCBI taxonomy ID.
Data frame of information about each gene set collection, where each row corresponds to a collection.
This data frame should contain the title, description, source and maintainer columns as described in ?fetchAllCollections.
List of data frames of length equal to nrow(collections).
Each data frame corresponds to a collection where each row corresponds to a gene set.
Each data frame should have the name and description columns as described in ?fetchAllSets.
List of list of integer vectors.
Each inner list corresponds to a collection and each vector corresponds to a gene set in that collection.
Each vector contains the identities of its constituent genes, as row indices into the data frame returned by fetchAllGenes.
All gene indices should be positive and no greater than num.genes.
(Unsorted and duplicate entries are allowed.)
Integer specifying the total number of genes available for this species.
String containing the path to a directory in which to create the database files.
Boolean indicating whether to run validateDatabaseFiles on the newly created files.
Several files are produced at path with the <species>_ prefix.
These can be made available for download with downloadDatabaseFile.
# Mocking up some information.
collections <- data.frame(
title=c("FOO", "BAR"),
description=c("I am a foo", "I am a bar"),
maintainer=c("Aaron", "Aaron"),
source=c("https://foo", "https://bar")
)
set.info <- list(
data.frame(
name=sprintf("FOO_%i", seq_len(20)),
description=sprintf("this is FOO %i", seq_len(20))
),
data.frame(
name=sprintf("BAR_%i", seq_len(50)),
description=sprintf("this is BAR %i", seq_len(50))
)
)
# Mocking up the gene sets.
num.genes <- 10000
set.membership <- list(
lapply(seq_len(nrow(set.info[[1]])), function(i) {
sample(num.genes, sample(500, 1))
}),
lapply(seq_len(nrow(set.info[[2]])), function(i) {
sample(num.genes, sample(200, 1))
})
)
# Now making the database files.
output <- tempfile()
dir.create(output)
prepareDatabaseFiles(
"9606",
collections,
set.info,
set.membership,
num.genes,
output
)
# We can then read directly from them:
config <- newConfig(fetch.file=function(x) file.path(output, x))
head(fetchAllSets("9606", config))
#> name description size collection number
#> 1 FOO_1 this is FOO 1 173 1 1
#> 2 FOO_2 this is FOO 2 416 1 2
#> 3 FOO_3 this is FOO 3 69 1 3
#> 4 FOO_4 this is FOO 4 460 1 4
#> 5 FOO_5 this is FOO 5 257 1 5
#> 6 FOO_6 this is FOO 6 121 1 6