Find sets overlapping a list of genes — findOverlappingSets • gesel

Find all sets overlapping any gene in a user-supplied list, and return the number of overlaps per set.

findOverlappingSets(species, genes, counts.only = TRUE, config = NULL)

Arguments

species: String containing the NCBI taxonomy ID of the species of interest.
genes: Integer vector containing gene indices. Each gene index refers to a row of the data frame returned by fetchAllGenes.
counts.only: Logical scalar indicating whether to only report the number of overlapping genes for each set.
config: Configuration list, typically created by newConfig. If NULL, the default configuration is used.

Value

A list containing:

overlap, a data frame of the overlapping sets. Each row represents a set that is identified by the set index in the set column. (This set index refers to a row of the data frame returned by fetchAllSets.) It also has:
- The count column, if counts.only=TRUE. This specifies the number of overlaps between the genes in the set and those in genes.
- The genes column, if counts.only=FALSE. This is a list that contains the entries of genes that overlap with those in the set.
present, an integer scalar containing the number of genes in genes that are present in at least one set in the Gesel database for species.

Details

The present number should be used as the number of draws when performing a hypergeomtric test for gene set enrichment (see phyper), instead of length(genes). It ensures that genes outside of the Gesel universe are ignored, e.g., due to user error, different genome versions. Otherwise, unknown genes would inappropriately increase the number of draws and inflate the enrichment p-value.

Author

Aaron Lun

Examples

overlaps <- findOverlappingSets("9606", 1:10)
head(overlaps$overlap)
#>     set count
#> 1  2311     6
#> 2  2483     5
#> 3 13273     4
#> 4 22837     4
#> 5 24037     4
#> 6 28862     4