Find all sets overlapping any gene in a user-supplied list, and return the number of overlaps per set.

findOverlappingSets(species, genes, counts.only = TRUE, config = NULL)

Arguments

species

String containing the NCBI taxonomy ID of the species of interest.

genes

Integer vector containing gene indices. Each gene index refers to a row of the data frame returned by fetchAllGenes.

counts.only

Logical scalar indicating whether to only report the number of overlapping genes for each set.

config

Configuration list, typically created by newConfig. If NULL, the default configuration is used.

Value

A list containing:

  • overlap, a data frame of the overlapping sets. Each row represents a set that is identified by the set index in the set column. (This set index refers to a row of the data frame returned by fetchAllSets.) It also has:

    • The count column, if counts.only=TRUE. This specifies the number of overlaps between the genes in the set and those in genes.

    • The genes column, if counts.only=FALSE. This is a list that contains the entries of genes that overlap with those in the set.

  • present, an integer scalar containing the number of genes in genes that are present in at least one set in the Gesel database for species.

Details

The present number should be used as the number of draws when performing a hypergeomtric test for gene set enrichment (see phyper), instead of length(genes). It ensures that genes outside of the Gesel universe are ignored, e.g., due to user error, different genome versions. Otherwise, unknown genes would inappropriately increase the number of draws and inflate the enrichment p-value.

Author

Aaron Lun

Examples

overlaps <- findOverlappingSets("9606", 1:10)
head(overlaps$overlap)
#>     set count
#> 1  2311     6
#> 2  2483     5
#> 3 13273     4
#> 4 22837     4
#> 5 24037     4
#> 6 28862     4