Fetch the identities of sets that contain some genes in the Gesel database. This can be more efficient than fetchSetsForAllGenes if only a few genes are of interest.

fetchSetsForSomeGenes(species, genes, config = NULL)

Arguments

species

String containing the NCBI taxonomy ID of the species of interest.

genes

Integer vector containing gene indices. Each gene index refers to a row of the data frame returned by fetchAllGenes).

config

Configuration list, typically created by newConfig. If NULL, the default configuration is used.

Value

List of integer vectors. Each vector corresponds to a gene in genes and contains the identities of the sets containing that gene. Each set is defined by its set index, which refers to a row of the data frame returned by fetchAllSets.

Details

Every time this function is called, information from the requested genes will be added to an in-memory cache. Subsequent calls to this function will re-use as many of the cached genes as possible before making new requests to the Gesel database.

If fetchSetsForAllGenes is called, its cached data will be directly used by fetchSomeSets to avoid extra requests to the database. If genes is large, it may be more efficient to call fetchSetsForAllGenes to prepare the cache before calling this function.

Author

Aaron Lun

Examples

first.gene <- fetchSetsForSomeGenes("9606", 1:5)
str(first.gene)
#> List of 5
#>  $ : int [1:68] 1327 2337 2366 3538 6639 8166 13182 13273 14384 17635 ...
#>  $ : int [1:205] 413 605 701 920 1999 2000 2127 2311 2337 2366 ...
#>  $ : int [1:7] 18984 20717 22134 27718 28006 40230 40391
#>  $ : int [1:160] 1512 2483 3071 19193 19377 20087 20344 20669 20741 21035 ...
#>  $ : int [1:64] 1512 2311 2483 3071 19193 19377 20680 20717 21388 22215 ...

# Sets containing the first gene.
all.set.info <- fetchAllSets("9606")
head(all.set.info[first.gene[[1]],])
#>            name                  description size collection number
#> 1327 GO:0003674           molecular_function  710          1   1327
#> 2337 GO:0005576         extracellular region 1916          1   2337
#> 2366 GO:0005615          extracellular space 1865          1   2366
#> 3538 GO:0008150           biological_process  561          1   3538
#> 6639 GO:0031093 platelet alpha granule lumen   67          1   6639
#> 8166 GO:0034774      secretory granule lumen  115          1   8166

# Identities of the requested genes.
fetchAllGenes("9606")[1:5,]
#>   symbol entrez      ensembl
#> 1   A1BG      1 ENSG0000....
#> 2    A2M      2 ENSG0000....
#> 3  A2MP1      3 ENSG0000....
#> 4   NAT1      9 ENSG0000....
#> 5   NAT2     10 ENSG0000....