Close this window.
The large and growing need for antiparasitic drugs (attributable to poor efficacy, high toxicity, and/or emergence of drug resistance), coupled with increasing interest in this area from both academic and phamaceutical sector research programs, has motivated the development of this "Drug Target Portfolio" database, intended to facilitate filtering and prioritizing potential drug targets for parasitic organisms. Focusing on high priority diseases flagged by the Tropical Disease Research program of the WHO (including tuberculosis, leprosy, malaria, sleeping sickness, Chagas disease, leishmaniasis, lymphatic filariasis, schistosomiasis, and onchocerciasis), this in silico strategy brings together data and annotation emerging from genome sequencing and functional genomics projects, structural data, manual curation of inhibitors and targets, and information on target essentiality and druggability. Where information is not available for organisms or targets of interest, insight may often be inferred through orthology (cf. OrthoMCL-DB). For example: high-confidence structural models have been generated for many parasite proteins based on crystal structures for orthologus proteins; while genome-wide essentiality data is not available for any parasitic organism, genes known to be essential in other species (S. cerevisiae, C. elegans, etc.) may be informative; target druggability (and even candidate inhibitors) can often be inferred by analogy to extensive studies conducted on orthologous proteins in bacteria. Users may query the TDR Targets database based on any of the above criteria, alone or in combination. A 'history' function permits individual queries to be combined, using user-defined weights to create a ranked list of genes, which may be downloaded in tab-delimited format for entry into spreadsheets. The database allows users to post their ranked list of genes for others to view or modify. A guide to the various datasets, database structure and functionality is outlined below.
Close this window.
Species for which datasets are
currently loaded in the database: |
Species for which future loading
of datasets is planned: |
Close this window.
Under this category all basic genome info has been captured. These are gene ID, gene name, gene product name, exon count, length of gene, length of protein, molecular weight of protein, isoelectric point of protein, hydrophobicity of proteins, number of transmembrane domains and presence of signal peptide. Data were obtained from respective genome databases (GenBank, GeneDB, PlasmoDB, ToxoDB, Leproma, and TubercuList).
Genes were classified into enzyme, transporter and receptor categories as follows. Enzymes: genes that have one or more of the following features: 1) an EC number, 2) a GO term for catalytic activity ; GO:0003824, or one of its more specific subterms eg kinase activity ; GO:0016301, 3) that are annotated with an enzyme name eg dehydrogenase, calpain etc, 4) that are orthologs of known enzymes from other organisms (eg Saccharomyces cerevisiae). Transporters: genes that have one or more of the following features: 1) a GO term for transporter activity ; GO:0005215, or one of its more specific subterms eg chloride channel activity ; GO:0005254 (excluded any genes associated with non-transmembrane transport eg carrier proteins or proteins involved in vesicle transport), 2) that are annotated with a transporter name eg pteridine transporter. Receptors: genes that have one or more of the following features: 1) a GO term for receptor activity ; GO:0004872, or one of its child terms eg peptide receptor activity ; GO:0001653 2) or that are annotated with receptor in their name.
Close this window.
Under this category, functional annotation information available for individual genes is captured. These are protein domain (pfam / interpro), gene ontology (GO) annotation and EC number annotation for enzymes. Data was obtained from respective genome databases of the pathogens and interpro scans were run to get the most recent protein family and GO assignments. Using the GO annotation data, GO slim categories were created for assigning genes to various functional class categories and metabolic process categories.
Close this window.
Under this category, information on the availability of crystal structures for proteins is captured. In addition, molecular modeling of the protein structure was undertaken on a genome wide scale for all genomes of interest. Structure models for either the whole protein or part of a protein (protein domain) were obtained based on a template structure. Structure data was obtained from PDB while structure model data was obtained from Andrej Sali's lab and the models can be accessed from http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi.
Close this window.
Phyletic distribution data for orthologs were obtained from the OrthoMCL database. Paralogs and orthologs for individual genes were identified and clustered into ortholog groups by reciprocal best blast hits (all-against-all) and Markov clustering. Clustering of genes that are orthologs from different species allows us to transfer or adopt functional information for a gene from a reference species to the parasite species of interest. In addition, this data is also useful for identifying duplication of genes and expansion of gene families. More information on ortholog identification and clustering can be obtained from http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi?rm=orthomcl#Background
Close this window.
This is a collection of experimental data from various species on gene essentiality. Genome wide gene knockout and knockdown data from certain species (C. elegans, E. coli, S. cerevesiae and M. tuberculosis) was used to categorize orthologous genes from parasitic species of interest as being essential or not essential. More data from other reference species will be added in the future. Data for essentiality were obtained from the Saccharomyces Genome Database (SGD), Profiling of E. coli Chromosome (PEC), Keio Collection, National Microbial Pathogen Data Resource (NMPDR), WormBase and New England Biolabs (NEB).
Close this window.
The druggability analysis was carried out to get an estimate of the likelihood of a protein being druggable. Two kinds of estimates are available: the druggability index (Dindex) value and the compound desirability value.
The Dindex is a composite score consisting of a weighted normalised sum, where each of the different druggability prediction methods are given different weights depending on their relative contribution to prediction. The Dindex values range from 0 to 1, where a larger index score for a gene means that the gene is more likely to be a druggable target. By doing sequence similarity searches (using ortholog clustering and BLAST) against a database of known targets derived from the latest Inpharmatica literature SAR database (Starlite) a large number of proteins in TDR priority species could be linked to a known druggable target with at least 1 small molecule compound with a binding affinity less than 10 uM.
The compound desirability index is a fitness value that summarizes the average "chemical quality" of each target. The desirability function is based on Harrington's desirability index, which is based on the molecular properties and oral distribution of small molecule drugs. The function also contains penalty functions for acidity, promiscuity and structural alerts (risk and reactive groups in compounds). Therefore the compound desirability value links directly to actual compounds (which could be the basis for composing target-specific screening subsets of compounds), but the compounds have not been disclosed and are not available for searching and/or display.
The above two datasets are a result of combined analysis done by both Pfizer and Inpharmatica and this information is proprietary.
Drugs to genes association data available from DrugBank were mined and homologs and orthologs of these genes were mapped to pathogen species of interest. While some of the compounds in DrugBank might prove relevant for parasite diseases, it is more likely that these compounds are best used as informatics probes to identify small focused diversity sets for screening. This dataset was obtained from Robert Campbell of Brandeis University. In addition, data for associations of drugs/compounds with genes was also mined from the literature through Pubmed.
Close this window.
Putative antigenic peptides were predicted for genes using the method of Kolaskar and Tongaonkar, as implemented in EMBOSS (antigenic). Each predicted epitope has an associated score based on the physicochemical properties of the amino acid residues. A cumulative antigenicity score was calculated as the sum of scores of all predicted epitopes for a given protein. A normalized antigenicity index was calculated as the ratio of this cumulative score over the protein length. Finally, percentile values for antigenicity index was obtained by calculating the percent of proteins in a genome that fall below a given antigenicity index. Thus, a query for antigenicity index percentile greater than 80 will retrieve all proteins that contain the top 20 percent of antigenicity index values for the given genome.
Close this window.
Data on functional studies carried out on various pathogen genes using genetic and biochemical techniques was collected from the literature by manually looking through publications of individual genes or by community wide surveys. Data collected in this fashion was represented in a structured ontology format for better data querying and retrieval purposes. The structured ontology represents validation of phenotypes observed for each gene. Validation is classified into genetic or chemical. Under genetic validation, data is available for validation of phenotype by overexpression, loss-of-function mutant, knockout unrecovered and RNAi/antisense assay. Under chemical validation, data is available for validation of phenotype by cell-free assay, in vitro culture assay, animal system and clinical assay. The phenotypes observed from these studies are categorized as 'abnormal' or 'lethal.' This curation process was primarily carried out by the TDR Drug Target Group. The community wide survey for drug targets in Human African Trypanosomiasis (HAT) was initiated by the Drug Discovery group in University of Dundee following a workshop on drug discovery for Trypanosomatid diseases and was hosted at https://decide.ideareach.com/. Further such surveys for helminth and other protozoan parasites will be conducted at the tdrtargets.org site in the near future. All the survey datasets will be uploaded into the database using the structured ontology format.
Close this window.
Reference data for genes were mined from PubMed.
Close this window.
The database is an open access, lightweight database and is quite simple to use, having only a SEARCH page, a HISTORY page and a POSTED LISTS OF TARGETS page.
Close this window.
This page allows the user to query the genome of a pathogen of interest using one or more of the criteria listed on the page (and discussed above). The user can run the query by clicking on any of the 'search' buttons anywhere in the page. The user also has the option of naming the query either in the SEARCH page or on the HISTORY page. The search results are displayed as a list of genes on a separate page, displaying species name, gene ID, ortholog group id, gene product name and genome source database. Clicking on the gene ID will open a new page containing all associated information available for that gene in the database along with useful links to external databases. After running one or more queries the user can go to the HISTORY page for further refinement and processing of the data.
Close this window.
On this page, users can view all queries that have been run presently or from a previous session. Users can also Save and Post/Publish saved queries for community wide viewing. In order to utilize all of these functionalities from the HISTORY page, users are requested to register for a login account in the database. The user name will be the user's email ID and the password can be anything of choice. The Upload function allows the user to upload a list of genes from a given pathogen, which will be listed as a new query on the HISTORY page and be available for combining and analysis with other available queries. The upload file should be a text file consisting a list of gene IDs that will match the gene IDs from the given pathogen's genome database.
The various history functionalities like Union, Intersection, Subtraction, Delete, Rename, and Save can be accessed from the Combine Selected Queries section on the HISTORY page. Users can select one or more queries from the My Queries section (by clicking in the box on the left of the queries) and manipulate them by choosing anyone of the history functionalities (by clicking on the circle to the left of the history functionality term).
Ranking genes using the Union functionality from the HISTORY page: A very important and novel function of the database is its ability to allow users to assign scores and rank genes that were obtained as query results. The users will be able to rank the combined sets of genes obtained from the Union of two or more queries. This is done by assigning a numerical integer value (either positive or negative) to each of the selected queries and performing a Union of all the selected queries. The numerical values for the score (also referred to as Weight in the database) can be added to the box on the left of the Weight term listed below each query. The result is a combined list of genes from the selected queries that are sorted in decreasing order of scores associated with each gene. The scores are additive across the selected queries.
Intersection: Intersection of two or more queries results in a list of genes that are common to all of the intersected queries.
Subtraction: Subtraction of two or more queries results in the removal of those genes from query-1 (which is topmost among the selected queries as listed in the My Queries section) that also occur in all of the subsequent queries that have been selected.
The results of Union, Intersection, and Subtraction functionalities appear as new queries at the bottom of the list of queries in the My Queries section. These and any other queries can be renamed by using the Rename functionality. By clicking the Show Parameters term listed on the right of the query name, users will be able to see the pathogen species and the combination of criteria that was used to generate that particular query. This will come in handy for understanding how queries posted (or published) by other users were run or while renaming and or modifying the parameters of existing queries.
Delete: The Delete functionality is used to erase the selected queries from the current session. Individual queries can also be deleted by clicking on the delete term listed for each query.
Rename: The Rename functionality is used to assign a new name to the queries listed under the current session. The new name can be typed into the box provided under each query.
Exporting query results: The results of the queries that are listed in the My Queries section can be exported. Clicking on the Export term listed to the right of each query opens a new page where the user can select the different data types they would like to export along with the gene ID. Currently the data types that can be exported are pretty minimal and relate to very basic gene/protein specific data. Other, multidimensional functional data types will be available for download in the near future. Users can also select the format of the file (i.e., tab, cvs, pipe separated) for the different platforms they are using. When Weighted or Ranked query results are exported, the associated scores automatically appear as the rightmost column in the exported file.
Saving data: Users can save either individual queries or multiple queries that are currently listed under the My Queries section by selecting the relevant queries and then selecting and performing the Save function. While saving data (either a single query or multiple queries as a set) the user needs to provide a name for the saved dataset in the box on the right side of the Save function button. The saved query sets appear in the My Saved Query Sets section and they can be Posted (or Published), moved to the current session (My Queries section) or deleted by clicking on the corresponding terms that are listed beneath each query.
Close this window.
Posting query results allows all users who visit the database to view and modify copies of the posted data. To see a list of queries posted by others, users should go to the POSTED LIST OF TARGETS page. Only users who are registered and have logged in can save and post data, and users can publish only those queries that have been saved and are displayed under the My Saved Query Sets section. This is achieved by simply clicking on the Publish term listed below each query. The user is prompted to enter a name and description for the data that is being posted or published. All published queries will appear under the My Published Query Sets section. All query sets that are displayed in this section will also be displayed in the POSTED LIST OF TARGETS page. All users who visit the database will be able to view and access published queries.
Removing published data: Query sets that were previously published can be removed from the POSTED LIST OF TARGETS page only by the user who had originally published them. Users can achieve this by simply clicking on the Unpublish term listed under individual query sets in the My Published Query Sets section.
Accessing published queries for viewing and further analysis: Users can access published material by moving the selected queries from the POSTED LIST OF TARGETS page to the My Queries section on the HISTORY page. This is achieved by simply clicking on the Move To My Current Session term that is listed below each query. Users can then view, modify, assign scores for weighting and combine these queries with other queries in their current session.
Close this window.
As a follow-up to the HAT survey, community-wide surveys for targets on a wide variety of parasitic species that are TDR target organisms has been proposed. This page is a survey page, on which users can filled out the details of the target genes that are of interest to them. Information that could be provided here includes free text description of target gene information, gene ID, gene name, EC number (if applicable), has the gene been validated by genetic or chemical experiments, availability of functional assay and availability of cDNA and/or protein for experimental work. Users can also browse already existing survey entries.
Close this window.
Query 1 (see screenshot below): |
Species: Plasmodium falciparum |

Query 1 Result (see screenshot below): 329 genes in total |

Query 2: |
Species: Plasmodium falciparum | |
Query 3: |
Species: Plasmodium falciparum |
Perform the Intersection of these three queries (see screenshot below) . . .

. . . and the result is a list of 3 genes (see screenshot below).

Close this window.
Query 1: |
Species: Trypanosoma brucei | |
Query 2: |
Species: Trypanosoma brucei | |
Query 3: |
Species: Trypanosoma brucei | |
Query 4: |
Species: Trypanosoma brucei |

Perform the Union of these queries (see screenshot above) and the result is a list of 6350 genes that are listed in decreasing order of cumulative Assigned Weight score (see screenshot below). Note that a few well known T. brucei targets are listed here including trypanothione reductase.
