prepareDatabaseFiles.Rd
Prepare Gesel database files from various pieces of gene set information.
prepareDatabaseFiles(
species,
collections,
set.info,
set.membership,
num.genes,
path = "."
)
String specifying the species in the form of its NCBI taxonomy ID.
Data frame of information about each gene set collection, where each row corresponds to a collection.
This data frame should contain the same columns as that returned by fetchAllCollections
.
Data frame of information about each gene set, where each row corresponds to a set.
This data frame should contain the same columns as that returned by fetchAllSets
.
List of integer vectors, where each vector corresponds to a gene set and contains the indices of its constituent genes.
All gene indices should be positive, no greater than num.genes
, and unique within each set.
Integer scalar specifying the total number of genes available for this species.
String containing the path to a directory in which to create the database files.
Several files are produced at path
with the <species>_
prefix.
These can be made available for download with downloadDatabaseFile
.
# Mocking up some information.
collections <- data.frame(
title=c("FOO", "BAR"),
description=c("I am a foo", "I am a bar"),
maintainer=c("Aaron", "Aaron"),
source=c("https://foo", "https://bar"),
start=c(1L, 21L),
size=c(20L, 50L)
)
set.info <- data.frame(
name=c(
sprintf("FOO_%i", seq_len(20)),
sprintf("BAR_%i", seq_len(50))
),
description=c(
sprintf("this is FOO %i", seq_len(20)),
sprintf("this is BAR %i", seq_len(50))
),
collection=rep(1:2, c(20L, 50L))
)
# Mocking up the gene sets.
num.genes <- 10000
set.membership <- split(
sample(num.genes, 5000, replace=TRUE),
factor(
sample(nrow(set.info), 5000, replace=TRUE),
seq_len(nrow(set.info))
)
)
set.membership <- lapply(set.membership, unique)
set.info$size <- lengths(set.membership)
# Now making the database files.
output <- tempfile()
dir.create(output)
prepareDatabaseFiles(
"9606",
collections,
set.info,
set.membership,
num.genes,
output
)
# We can then read directly from them:
config <- newConfig(fetch.file=function(x) file.path(output, x))
head(fetchAllSets("9606", config))
#> name description size collection number
#> 1 FOO_1 this is FOO 1 71 1 1
#> 2 FOO_2 this is FOO 2 77 1 2
#> 3 FOO_3 this is FOO 3 62 1 3
#> 4 FOO_4 this is FOO 4 76 1 4
#> 5 FOO_5 this is FOO 5 64 1 5
#> 6 FOO_6 this is FOO 6 79 1 6