gesel
Validating Gesel database files
Loading...
Searching...
No Matches
gesel Namespace Reference

Validate Gesel database and gene files. More...

Functions

void validate_database (const std::string &prefix, uint64_t num_genes)
 
uint64_t validate_genes (const std::string &prefix, const std::vector< std::string > &types)
 
uint64_t validate_genes (const std::string &prefix)
 

Detailed Description

Validate Gesel database and gene files.

Function Documentation

◆ validate_database()

void gesel::validate_database ( const std::string & prefix,
uint64_t num_genes )
inline

Validate Gesel database files for a particular species. This checks all files for validity and consistency except for the gene mapping files (which are validated by validate_genes()). Any invalid formatting or inconsistency between files will result in an error.

Parameters
prefixPrefix for the Gesel database files. This should be of the form <DIRECTORY>/<SPECIES>_, where <SPECIES> is an NCBI taxonomy ID.
num_genesTotal number of genes for this species.

◆ validate_genes() [1/2]

uint64_t gesel::validate_genes ( const std::string & prefix)
inline

Overload for validate_genes(). This will scan the directory for all files starting with prefix and ending with ".tsv.gz".

Parameters
prefixPrefix for the Gesel gene files. This should be of the form <DIRECTORY>/<SPECIES>_, where <SPECIES> is an NCBI taxonomy ID.
Returns
Number of genes.

◆ validate_genes() [2/2]

uint64_t gesel::validate_genes ( const std::string & prefix,
const std::vector< std::string > & types )
inline

Validate Gesel gene mapping files for a particular species. Any invalid formatting or inconsistency between files will result in an error.

Parameters
prefixPrefix for the Gesel gene mapping files. This should be of the form <DIRECTORY>/<SPECIES>_, where <SPECIES> is an NCBI taxonomy ID.
typesVector of gene name types, e.g., "ensembl", "symbol". This should contain at least one value.
Returns
Number of genes.