Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training

Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 25-06-2018 at 08:00 am
Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris

Students will need to bring their laptop.

More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (

Target audience: PhD students in biological sciences with a strong focus on microbiology/biochemical applications.

Instructor: Prof. Valérie de Crécy-Lagard is an expert in comparative genomics. She has been using comparative genomic methods to link gene and function for over twenty years and has developed curriculum the teach integrative data mining tools at all levels.

Module organization

The training aims at enabling researchers and students to master an array of web-based tools to help to predict gene function. This will allow them to generate in silico based functional predictions and produce illustration for manuscripts that use comparative genomic methods.

The course will last 5 days and uses a blend of lectures and hands-on application

• Module 1: Basic bioinformatics tools day 1, morning). This module is to bring everyone up to date on the basic tools that will be routinely used in the course. These will include data extraction from major biological databases such as NCBI and Uniprot, Blast, multiple alignments, accessing precomputed phylogenetic trees and genome browsers.

• Module 2: Linking genes to pathway and pathway to genes (day 1, afternoon):.This module will focus on pathway databases, metabolic reconstruction and models and how mapping a gene to a pathway or more generally to biological system can ground truth a functional annotation.

• Module 3: Non homology based association methods (day 2, morning). Physical clustering, phylogenetic distribution, comparing whole genomes, iTOL visualization tools

• Module 4: Paralogs a blessing and a curse (day 2, afternoon). This module will focus on the tools and strategy to disambiguate paralog families. This includes, Basic phylogenetic tree building, paralog separation tools, building and comparing logos

• Module 5: Regulatory based associations. (day 3, morning). This module focuses on identifying regulatory sites, predicting regulatory networks, mining transcriptome data, and generating heatmaps and Venn diagrams.

• Module 6: Beyond transcriptomics, mining other types of high throughput experimental data (day 3, afternoon).. This module focuses on mining other types of experimental data Phenotype/fitness, protein interaction and complexes, localization, metabolomics.

• Module 7: Putting it together in the MicroScope platform (day 4, all day). This module proposed by the CEA will cover some of the techniques discussed above using Microscope :

• Module 8: Putting it together with student examples (day 5, all day).). The last day of class students will have the opportunity to work on a protein family of their choice with the help of the instructors.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting