prepareDatabaseFiles.RdPrepare Gesel database files from various pieces of gene set information.
prepareDatabaseFiles(
species,
collections,
set.info,
set.membership,
num.genes,
path = "."
)String specifying the species in the form of its NCBI taxonomy ID.
Data frame of information about each gene set collection, where each row corresponds to a collection.
This data frame should contain the same columns as that returned by fetchAllCollections.
Data frame of information about each gene set, where each row corresponds to a set.
This data frame should contain the same columns as that returned by fetchAllSets.
List of integer vectors, where each vector corresponds to a gene set and contains the indices of its constituent genes.
All gene indices should be positive, no greater than num.genes, and unique within each set.
Integer scalar specifying the total number of genes available for this species.
String containing the path to a directory in which to create the database files.
Several files are produced at path with the <species>_ prefix.
These can be made available for download with downloadDatabaseFile.
# Mocking up some information.
collections <- data.frame(
title=c("FOO", "BAR"),
description=c("I am a foo", "I am a bar"),
maintainer=c("Aaron", "Aaron"),
source=c("https://foo", "https://bar"),
start=c(1L, 21L),
size=c(20L, 50L)
)
set.info <- data.frame(
name=c(
sprintf("FOO_%i", seq_len(20)),
sprintf("BAR_%i", seq_len(50))
),
description=c(
sprintf("this is FOO %i", seq_len(20)),
sprintf("this is BAR %i", seq_len(50))
),
collection=rep(1:2, c(20L, 50L))
)
# Mocking up the gene sets.
num.genes <- 10000
set.membership <- split(
sample(num.genes, 5000, replace=TRUE),
factor(
sample(nrow(set.info), 5000, replace=TRUE),
seq_len(nrow(set.info))
)
)
set.membership <- lapply(set.membership, unique)
set.info$size <- lengths(set.membership)
# Now making the database files.
output <- tempfile()
dir.create(output)
prepareDatabaseFiles(
"9606",
collections,
set.info,
set.membership,
num.genes,
output
)
# We can then read directly from them:
config <- newConfig(fetch.file=function(x) file.path(output, x))
head(fetchAllSets("9606", config))
#> name description size collection number
#> 1 FOO_1 this is FOO 1 71 1 1
#> 2 FOO_2 this is FOO 2 77 1 2
#> 3 FOO_3 this is FOO 3 62 1 3
#> 4 FOO_4 this is FOO 4 76 1 4
#> 5 FOO_5 this is FOO 5 64 1 5
#> 6 FOO_6 this is FOO 6 79 1 6