Seminars – Linking gene and function by integrated approaches: how to improve the poor annotation status of sequenced genomes

EVENT : C3BI Seminars

Linking gene and function by integrated approaches: how to improve the poor annotation status of sequenced genomes


Main speaker : Valérie de Crécy-Lagard, from Professor University of Florida – Department of Microbiology and Cell Science & Genetics Institute Date : 07/09/2017 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


Linking gene and function by integrated approaches: how to improve the poor annotation status of sequenced genomes

Identifying the function of every gene in all sequenced organisms is the major challenge of the post-genomic era and an obligate step for any systems biology approach. This objective is far from reached. By various estimates, at least 30-50% of the genes of any given organism are of unknown function, incorrectly annotated, or have only a generic annotation such as “ATPase”. Moreover, with ~8000 genomes sequenced and ~80,000 in the pipeline (http://www.genomeson.line.org), the numbers of unknown genes are increasing, and annotation errors are proliferating rapidly. For some gene families, 40% of the annotations are wrong. On the other side of the coin, there are still ~1,900 known enzyme activities for which no corresponding gene has been identified and these numbers are also increasing. This biochemical knowledge is yet to be captured in genome annotations.
Using mainly a comparative genomic approach, we have linked gene and function for around 50 gene families related mainly to the fields of coenzyme metabolism, tRNA modification, protein modification and more recently metabolite repair. This approach integrates several types of data and uses filters, sieves, and associations to make predictions that can then be tested experimentally. An unknown gene’s function may thus be predicted from those of its associates: the ‘guilt by association’ principle. Associations that can be derived from whole genome datasets include: gene clustering, gene fusion events, phylogenetic occurrence profiles or signatures and shared regulatory sites. Post-genomic experimental sources such as protein interaction networks, gene expression profiles and phenomics data can also be used to find associations. In practice it is often ‘guilt by multiple association’ as genes can be associated in several ways, and analyzing more than one of these improves the accuracy of predictions. If these types of comparative genomic approaches were systematically used to annotate genomes, the quality of annotations would greatly improve. Also the experimentalists need to be more involved in the annotation process, as without expert knowledge the curation effort is beyond what annotation resources such as Uniprot or NCBI can do alone.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

25 July 2017 Comments (None)