findOverlappingSets.Rd
Find all sets overlapping any gene in a user-supplied list, and return the number of overlaps per set.
findOverlappingSets(species, genes, counts.only = TRUE, config = NULL)
String containing the NCBI taxonomy ID of the species of interest.
Integer vector containing gene indices.
Each gene index refers to a row of the data frame returned by fetchAllGenes
.
Logical scalar indicating whether to only report the number of overlapping genes for each set.
Configuration list, typically created by newConfig
.
If NULL
, the default configuration is used.
A list containing:
overlap
, a data frame of the overlapping sets.
Each row represents a set that is identified by the set index in the set
column.
(This set index refers to a row of the data frame returned by fetchAllSets
.)
It also has:
The count
column, if counts.only=TRUE
.
This specifies the number of overlaps between the genes in the set and those in genes
.
The genes
column, if counts.only=FALSE
.
This is a list that contains the entries of genes
that overlap with those in the set.
present
, an integer scalar containing the number of genes in genes
that are present in at least one set in the Gesel database for species
.
The present
number should be used as the number of draws when performing a hypergeomtric test for gene set enrichment
(see phyper
), instead of length(genes)
.
It ensures that genes outside of the Gesel universe are ignored, e.g., due to user error, different genome versions.
Otherwise, unknown genes would inappropriately increase the number of draws and inflate the enrichment p-value.
overlaps <- findOverlappingSets("9606", 1:10)
head(overlaps$overlap)
#> set count
#> 1 2311 6
#> 2 2483 5
#> 3 13273 4
#> 4 22837 4
#> 5 24037 4
#> 6 28862 4