Global

Methods

adjustFdr(pvalues, optionsopt) → {Float64Array}

Description:
  • Adjust p-values to control the false discovery rate using the Benjamini-Hochberg method. This is primarily intended for use with p-values from testEnrichment, typically using the total number of sets from numberOfSets as totalTests.

Source:
Parameters:
Name Type Attributes Default Description
pvalues Float64Array

Array of p-values.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
totalTests number <optional>
<nullable>
null

Total number of tests. If null, defaults to the length of pvalues. If greater than pvalues, all tests not in pvalues are assumed to have p-values of 1.

Returns:

Array of length equal to pvalues, containing the BH-adjusted p-values.

Type
Float64Array

computeEnrichmentCurve(ranking, setMembers, optionsopt) → {object}

Description:
  • Compute an enrichment curve from a gene ranking. At each position in the ranking, the value of the curve is defined as the proportion of genes with the same or higher rank that are present in the gene set. This can be used to visualize the change in enrichment as the ranking changes, typically with respect to some kind of decreasing importance.

Source:
Parameters:
Name Type Attributes Default Description
ranking Array | TypedArray

Ranking of genes, where earlier entries are considered to be more highly ranked. Each entry may either be an integer representing a gene (typically a gesel gene ID), or an array of such integers, e.g., as produced by searchGenes.

setMembers Set | Array | TypedArray

Array of integers specifying the genes (typically gesel gene IDs) belonging to the gene set. A preconstructed Set may also be supplied.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
pseudoCount number <optional>
5

Count to add to the total number of genes when computing the proportion. This avoids large fluctuations at the start of the curve at the cost of biasing the reported proportions.

Returns:

Object containing the following properties:

  • proportions: a Float64Array of length equal to ranking. Each entry contains the proportion of genes with equal or higher ranks that belong to the set.
  • found: a Uint32Array containing the indices of ranking corresponding to the genes that were found in the set.
Type
object

countSetOverlaps(setsForSomeGenes) → {Array}

Description:
  • This is a utility function that is called internally by findOverlappingSets. However, it can be used directly to obtain overlap counts if the gene-to-set mappings are manually obtained.

Source:
Parameters:
Name Type Description
setsForSomeGenes Array

Array where each entry corresponds to a gene and contains an array of the set IDs containing that gene. Each inner array is typically the result of calling fetchSetsForGene.

Returns:

An array of objects, where each object corresponds to a set that is present in at least one entry of setsForSomeGenes. Each object contains:

  • id: the ID of the set in fetchAllSets.
  • count: the number of genes in the set that overlap with genes in genes.
Type
Array

effectiveNumberOfGenes(species, config) → {number}

Description:
  • Count the number of genes in the Gesel database that belong to at least one set.

    The return value should be used as the total number of balls when performing a hypergeometric test for gene set enrichment, instead of the length of the array returned by fetchAllGenes. This ensures that uninteresting genes like pseudo-genes or predicted genes are ignored during the calculation. Otherwise, unknown genes would inappropriately increase the number of balls and understate the enrichment p-values.

    See also the documentation for fetchSetsForSomeGenes for some comments about caching.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Number of genes that belong to at least one set for species. This can be used as a more appropriate universe size in testEnrichment.

Type
number

(async) fetchAllCollections(species, config) → {Array}

Description:
  • Fetch information about all gene set collections in the Gesel database.

    If this function is called once, the data frame will be cached in memory and re-used in subsequent calls to this function. The cached data will also be used to speed up calls to fetchSomeCollections.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Array of objects where each entry corresponds to a gene set collection and contains details about that collection. Each object can be expected to contain:

  • title, the title for the collection.
  • description, the description for the collection.
  • species, the species for all gene identifiers in the collection. This should contain the full scientific name, e.g., "Homo sapiens", "Mus musculus".
  • maintainer, the maintainer of this collection.
  • source, the source of this set, usually a link to some external resource.
  • start, the index for the first set in the collection in the output of sets. All sets from the same collection are stored contiguously.
  • size, the number of sets in the collection.

In a gesel context, the identifier for a collection (i.e., the "collection ID") is defined as the index of the collection in this array.

Type
Array

(async) fetchAllGenes(species, config, optionsopt) → {Map}

Source:
Parameters:
Name Type Attributes Default Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
types Array <optional>
<nullable>
null

Array of strings specifying the identifier types to be retrieved. The exact choice of strings depends on how the references were constructed. If null, it defaults to an array containing "symbol", "entrez" and "ensembl".

Returns:

Object where each key is named after an identifier type in types. Each value is an array where each element corresponds to a gene and is itself an array of strings containing all identifiers of the current type for that gene.

The arrays for different identifier types are all of the same length, and corresponding elements across these arrays describe the same gene. gesel's identifier for each gene (i.e., the "gene ID") is defined as the index of that gene in any of these arrays.

Type
Map

(async) fetchAllSets(species, config) → {Array}

Description:
  • Fetch information about all gene sets in the Gesel database.

    If this function is called once, the data frame will be cached in memory and re-used in subsequent calls to this function. The cached data will also be used to speed up calls to fetchSomeSets.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Array of objects where each entry corresponds to a set and contains the details about that set. Each object can be expected to contain:

  • name, the name of the set.
  • description, the description of the set.
  • size, the number of genes in the set.
  • collection, the index of the collection containing the set.
  • number, the number of the set within the collection.

In a gesel context, the identifier for a set (i.e., the "set ID") is defined as the index of the set in this array.

Type
Array

(async) fetchCollectionSizes(species, config) → {Array}

Description:
  • Get the size of each gene set collection.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Number of sets in each collection. Each value corresponds to a collection in fetchAllCollections.

Type
Array

(async) fetchGenesForAllSets(species, config) → {Array}

Description:
  • Fetch the gene membership of all sets in the Gesel database.

    If this function is called once, the returned list will be cached in memory and re-used in subsequent calls to this function. The cached data will also be used to speed up calls to fetchGenesForSomeSets.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to the total number of sets for this species. Each element corresponds to an entry in fetchAllSets and is a Uint32Array containing the IDs for all genes belonging to that set. Gene IDs refer to indices in fetchAllGenes.

Type
Array

(async) fetchGenesForSomeSets(species, sets, config) → {Array}

Description:
  • Fetch the gene membership of some sets in the Gesel database. This can be more efficient than fetchGenesForAllSets if only a few sets are of interest.

    Every time this function is called, information from the requested sets will be added to an in-memory cache. Subsequent calls to this function will re-use as many of the cached sets as possible before making new requests to the Gesel database.

    If fetchGenesForAllSets was previously called, its cached data will be directly used by fetchGenesForSomeSets to avoid performing extra requests to the database. If sets is large, it may be more efficient to call fetchGenesForAllSets to prepare the cache before calling this function.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

sets Array

Array of set IDs. Each ID is a row index in the array returned by fetchAllSets.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to sets. Each entry is a Uint32Array containing the IDs for all genes belonging to the corresponding set in sets. Gene IDs refer to indices in fetchAllGenes.

Type
Array

(async) fetchSetSizes(species, config) → {Array}

Description:
  • Get the size of each gene set.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Number of genes in each set. Each value corresponds to a set in fetchAllSets.

Type
Array

(async) fetchSetsForAllGenes(species, config) → {Array}

Description:
  • Fetch the identities of the sets that contain each gene in the Gesel database.

    If this function is called once, the returned list will be cached in memory and re-used in subsequent calls to this function. The cached data will also be used to speed up calls to fetchSetsForSomeGenes.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to the total number of genes for this species. Each element corresponds to an entry in fetchAllGenes and is a Uint32Array containing the IDs for all sets containing that gene. Set IDs refer to indices in fetchAllSets.

Type
Array

(async) fetchSetsForSomeGenes(species, genes, config) → {Array}

Description:
  • Fetch the identities of sets that contain some genes in the Gesel database. This can be more efficient than fetchSetsForAllGenes if only a few genes are of interest.

    Every time this function is called, information from the requested genes will be added to an in-memory cache. Subsequent calls to this function will re-use as many of the cached genes as possible before making new requests to the Gesel database.

    If fetchSetsForAllGenes is called, its cached data will be directly used by fetchSetsForSomeGenes to avoid extra requests to the database. If genes is large, it may be more efficient to call fetchSetsForAllGenes to prepare the cache before calling this function.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

genes Array

Array of gene IDs. Each ID is a row index in any of the arrays returned by fetchAllGenes.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to genes. Each entry is a Uint32Array containing the IDs for all sets containing to the corresponding gene in genes. Set IDs refer to indices in fetchAllSets.

Type
Array

(async) fetchSomeCollections(species, collections, config) → {Array}

Description:
  • Fetch the details of some gene set collections from the Gesel database. This can be more efficient than fetchAllCollections when only a few collections are of interest.

    Every time this function is called, information from the requested collections will be added to an in-memory cache. Subsequent calls to this function will re-use as many of the cached collections as possible before making new requests to the Gesel database.

    If fetchAllCollections was previously called, its cached data will be used by fetchSomeCollections to avoid extra requests to the database. If collections is large, it may be more efficient to call fetchAllCollections to prepare the cache before calling this function.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

collections Array

Array of collection IDs. Each entry is a row index into the array returned by fetchAllCollections.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to collections. Each entry is an object containing details about the corresponding collection in collections.

Type
Array

(async) fetchSomeSets(species, sets, config) → {Array}

Description:
  • Fetch the details of some gene sets from the Gesel database. This can be more efficient than calling fetchAllSets when only a few sets are of interest.

    Every time this function is called, information from the requested sets will be added to an in-memory cache. Subsequent calls to this function will re-use as many of the cached sets as possible before making new requests to the Gesel database.

    If fetchAllSets was previously called, its cached data will be directly used by fetchSomeSets to avoid performing extra requests to the database. If sets is large, it may be more efficient to call fetchAllSets to prepare the cache before calling this function.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

sets Array

Array of set IDs. Each ID is a row index in the array returned by fetchAllSets.

config object

Configuration object, see newConfig.

Returns:

Array of length equal to sets. Each entry is an object containing the set information for the corresponding set in sets.

Type
Array

(async) findOverlappingSets(species, genes, config, optionsopt) → {Array}

Source:
Parameters:
Name Type Attributes Default Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

genes Array

Array of unique integers containing user-supplied gene IDs, see fetchAllGenes for details.

config object

Configuration object, see newConfig.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
includeSize boolean <optional>
true

Whether to include the size of each set in the output.

testEnrichment boolean <optional>
true

Whether to compute the enrichment p-value for each set with testEnrichment. The list and universe sizes will only count genes that are involved in at least one set, by checking fetchSetsForGene and effectiveNumberOfGenes respectively.

Returns:

An array of objects, where each object corresponds to a set that has non-zero overlaps with genes. Each object contains:

  • id: the ID of the set in fetchAllSets.
  • count: the number of genes in the set that overlap with genes in genes.
  • size: the size of each set. Only included if includeSize = true.
  • pvalue: the enrichment p-value. Only included if testEnrichment = true.
Type
Array

flushMemoryCache(config)

Source:
Parameters:
Name Type Description
config object

Configuration object, see newConfig.

Flush all cached objects in config. This can be occasionally useful if the cache becomes too large.

intersect(arrays) → {Array}

Source:
Parameters:
Name Type Description
arrays Array

Array of arrays over which to compute the intersection.

Returns:

Intersection of all arrays in arrays.

Type
Array

(async) mapGenesByIdentifier(species, type, config, optionsopt) → {Map}

Source:
Parameters:
Name Type Attributes Default Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

type string

Type of the identifier to use as the key of the map, e.g., "ensembl".

config object

Configuration object, see newConfig.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
lowerCase boolean <optional>
false

Whether to use lower-case keys in the map.

Returns:

Map where each key is a string containing a (possibly lower-cased) identifier of the specified type and each value is an array. Each array contains the gesel gene IDs associated with the type identifier, see fetchAllGenes for ore details.

Type
Map

newConfig(fetchGene, fetchFile, fetchRanges, optionsopt) → {object}

Description:
  • Create a new configuration object to specify how the Gesel database should be queried. This can be used in each gesel function to point to a different Gesel database from the default.

    The configuration object also contains a cache of data structures that can be populated by gesel functions. This avoids unnecessary fetch requests upon repeated calls to the same function. If the cache becomes stale or too large, it can be cleared by calling flushMemoryCache.

Source:
Parameters:
Name Type Attributes Default Description
fetchGene function

Function that accepts the name of a Gesel gene description file and returns an ArrayBuffer of its contents. This may be async.

fetchFile function

Function that accepts the name of a Gesel database file and returns an ArrayBuffer of its contents. This may be async.

fetchRanges function

Function that accepts three arguments:

  • name, the name of the file in the Gesel database.
  • start, an array of integers containing the 0-based closed starts of the byte ranges.
  • end, an array of integers containing the 0-based open ends of the byte ranges. This is of the same length as start, such that the i-th range is defined as [start[i], end[i]).

It should return an array of ArrayBuffers of the same length as start, where each entry has the contents of the corresponding byte range. This may be async.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
consolidateBlockSize number <optional>
10000

Block size for consolidation, in bytes. gesel functions will consolidate near-adjacent ranges into larger blocks to reduce the number of requests. Larger block sizes will reduce the number of requests at the cost of larger requests.

Returns:

A configuration object.

Type
object

(async) numberOfCollections(species, config) → {number}

Description:
  • Get the total number of gene set collections.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Total number of collections for this species.

Type
number

(async) numberOfSets(species, config) → {number}

Description:
  • Get the total number of gene sets.

Source:
Parameters:
Name Type Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

config object

Configuration object, see newConfig.

Returns:

Total number of sets for this species.

Type
number

reindexGenesForAllSets(geneMapping, genesForSets) → {Array}

Description:
  • Reindex the gene sets for a user-defined gene universe. This is helpful for applications that know their own gene universe and want to convert the gesel gene IDs to indices within that universe.

Source:
Parameters:
Name Type Description
geneMapping Array

Array of length equal to the number of genes in a user-defined gene universe. Each entry corresponds to one gene in the user's universe and should be an array containing the corresponding gesel gene ID(s) (see fetchAllGenes for details).

genesForSets Array

Array of length equal to the number of reference gene sets. Each entry corresponds to a set and is an array containing gesel gene IDs for all genes in that set. This is typically obtained from fetchGenesForAllSets.

Returns:

Array of length equal to genesForSets. Each entry corresponds to a reference gene set and is a Uint32Array where the elements are indices into geneMapping, specifying the genes in the user's universe that belong to that set. If a gene in geneMapping maps to multiple gesel IDs, it is considered to belong to all sets containing any of its mapped gesel gene IDs.

Type
Array

reindexSetsForAllGenes(geneMapping, setsForGenes) → {Array}

Description:
  • Reindex the gene-to-set mappings for a user-defined gene universe. This is helpful for applications that know their own gene universe and want to create a mapping of all sets containing each of their own genes.

Source:
Parameters:
Name Type Description
geneMapping Array

Array of length equal to the number of genes in a user-defined gene universe. Each entry corresponds to one gene in the user's universe and should be an array containing the corresponding gesel gene ID(s) (see fetchAllGenes for details).

setsForGenes Array

Array of length equal to the number of gesel gene IDs. Each entry corresponds to a gesel gene ID and is an array containing the set IDs for all sets containing that gene. This is typically obtained from fetchSetsForAllGenes.

Returns:

Array of length equal to geneMapping. Each entry corresponds to a gene in the user-supplied universe and is a Uint32Array where the elements are the gesel set IDs containing that gene. If a gene in geneMapping maps to multiple gesel IDs, we report all sets containing any of its mapped gesel gene IDs.

Type
Array

(async) searchGenes(species, queries, config, optionsopt) → {Array}

Source:
Parameters:
Name Type Attributes Default Description
species string

Taxonomy ID of the species of interest, e.g., "9606" for human.

queries Array

Array of strings containing gene identifiers of some kind (e.g., Ensembl, symbol, Entrez).

config object

Configuration object, see newConfig.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
types Array <optional>
<nullable>
null

Array of strings specifying the identifier types to use for searching. The exact choice of strings depends on how the references were constructed. If null, it defaults to an array containing "entrez", "ensembl" and "symbol".

ignoreCase boolean <optional>
true

Whether to perform case-insensitive matching.

Returns:

An array of length equal to queries. Each element of the array is an array containing the gesel gene IDs with any identifiers that match the corresponding search string. See fetchAllGenes for more details on the interpretation of these IDs.

Type
Array

(async) searchSetText(species, query, config, optionsopt) → {Array}

Source:
Parameters:
Name Type Attributes Default Description
species string

The taxonomy ID of the species of interest, e.g., "9606" for human.

query string

Query string containing multiple words to search in the names and/or descriptions of each set.

Each stretch of alphanumeric characters and dashes is treated as a single word. All other characters are treated as punctuation between words, except for the following wildcards:

  • *: match zero or more alphanumeric or dash characters.
  • ?: match exactly one alphanumeric or dash character.

A set's name and/or description must contain all words in query to be considered a match.

config object

Configuration object, see newConfig.

options object <optional>
{}

Optional parameters.

Properties
Name Type Attributes Default Description
inName boolean <optional>
true

Whether to search the name of the set for matching words.

inDescription boolean <optional>
true

Whether to search the description of the set for matching words.

Returns:

Array of indices of the sets with names and/or descriptions that match query.

Type
Array

testEnrichment(overlap, listSize, setSize, universe) → {number}

Description:
  • Hypergeometric test for gene set enrichment, based on the overlap between a user-supplied list and the gene set.

Source:
Parameters:
Name Type Description
overlap number

Number of overlapping genes between the user's list and the gene set, typically obtained from findOverlappingSets.

listSize number

Size of the user's list.

setSize number

Size of the gene set, see the size property from fetchSingleSet.

universe number

Size of the gene universe (i.e., the total number of genes for this species). This can either be obtained from the arrays in fetchAllGenes or using effectiveNumberOfGenes.

Returns:

P-value for the enrichment of the user's list in the gene set. This may be NaN if the inputs are inconsistent, e.g., overlap is greater than listSize or setSize.

Type
number