Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training

Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 25-06-2018 at 08:00 am
Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris

Students will need to bring their laptop.

More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (

Target audience: PhD students in biological sciences with a strong focus on microbiology/biochemical applications.

Instructor: Prof. Valérie de Crécy-Lagard is an expert in comparative genomics. She has been using comparative genomic methods to link gene and function for over twenty years and has developed curriculum the teach integrative data mining tools at all levels.

Module organization

The training aims at enabling researchers and students to master an array of web-based tools to help to predict gene function. This will allow them to generate in silico based functional predictions and produce illustration for manuscripts that use comparative genomic methods.

The course will last 5 days and uses a blend of lectures and hands-on application

• Module 1: Basic bioinformatics tools day 1, morning). This module is to bring everyone up to date on the basic tools that will be routinely used in the course. These will include data extraction from major biological databases such as NCBI and Uniprot, Blast, multiple alignments, accessing precomputed phylogenetic trees and genome browsers.

• Module 2: Linking genes to pathway and pathway to genes (day 1, afternoon):.This module will focus on pathway databases, metabolic reconstruction and models and how mapping a gene to a pathway or more generally to biological system can ground truth a functional annotation.

• Module 3: Non homology based association methods (day 2, morning). Physical clustering, phylogenetic distribution, comparing whole genomes, iTOL visualization tools

• Module 4: Paralogs a blessing and a curse (day 2, afternoon). This module will focus on the tools and strategy to disambiguate paralog families. This includes, Basic phylogenetic tree building, paralog separation tools, building and comparing logos

• Module 5: Regulatory based associations. (day 3, morning). This module focuses on identifying regulatory sites, predicting regulatory networks, mining transcriptome data, and generating heatmaps and Venn diagrams.

• Module 6: Beyond transcriptomics, mining other types of high throughput experimental data (day 3, afternoon).. This module focuses on mining other types of experimental data Phenotype/fitness, protein interaction and complexes, localization, metabolomics.

• Module 7: Putting it together in the MicroScope platform (day 4, all day). This module proposed by the CEA will cover some of the techniques discussed above using Microscope :

• Module 8: Putting it together with student examples (day 5, all day).). The last day of class students will have the opportunity to work on a protein family of their choice with the help of the instructors.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Network Medicine: From Cellular Networks to the Human Diseasome

EVENT : C3BI Seminars

Main speaker : Albert-Lazlo Barabasi, from Center of Complex Networks Research, Northeastern University and Division of Network Medicine, Harvard University Date : 12-06-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular network. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships between apparently distinct (patho) phenotypes. Advances in this direction are essential to identify new disease genes, to uncover the biological significance of disease-associated mutations identified by genome-wide association studies and full genome sequencing, and to identify drug targets and biomarkers for complex diseases.

Albert-László Barabasi is a Romanian-born Hungarian-American physicist, best known for his work in the research of network theory. He is the former Emil T. Hofmann Professor at the University of Notre Dame and current Distinguished Professor and Director of Northeastern University’s Center for Complex Network Research (CCNR) associate member of the Center of Cancer Systems Biology (CCSB) at the Dana–Farber Cancer Institute, Harvard University, and visiting professor at the Center for Network Science at Central European University. He introduced in 1999 the concept of scale-free networks and proposed the Barabási–Albert model to explain their widespread emergence in natural, technological and social systems, from the cellular telephone to the World Wide Web or online communities. He is the Founding President of the Network Science Society, which grew out of and sponsors the flagship NetSci conference held yearly since 2006. (source Wikipedia)

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Unit Seminar – Alexander Serov & Antoine Frenoy


EVENT : C3BI Unit Seminars

Main speaker : Alexander Serov, from Decision and bayesian computation group Date : 15-03-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Alexander Serov (Decision and bayesian computation group) : “Robust Conservative Force Detection with the Overdamped Langevin Equation”


Antoine Frenoy (Microbial Evolutionary Genomics group) : “Death and population dynamics affect mutation rate estimates and evolvability under stress in bacteria”

Analyse de séquences

Main speaker : Corinne Maufrais, from C3BI
Date : 19-03-2018 at 09:30 am
Location : Module 3 – SOCIAL BUILDING (06) ,Institut Pasteur, Paris

Les biologistes sont régulièrement confrontés à des gènes (ou des protéines) de fonctions inconnues ou mal annotés. Dans ce contexte, maîtriser quelques techniques basiques d’analyse de séquences peut se révéler d’une aide précieuse.

L’objectif de cette formation est de présenter, au travers de l’utilisation de sites web spécialisés, quelques grands principes sur l’analyse de séquence. L’ensemble de la formation combine exposés théoriques (fondements méthodologiques des programmes) et applications pratiques (mise en relation des notions théoriques avec les paramètres des programmes et les résultats obtenus) pour permettre une utilisation autonome et critique de quelques logiciels d’analyse des séquences biologiques.

Les cours auront lieu du lundi 19 mars au vendredi 23 mars de 9h30 à 12h30 et de 14h à 17h MODULE 4/ Salle 3 sous la cantine.

Répondre à cette annonce avant le 9 Mars 2018 en envoyant un email à

Ce cours s’adresse à toutes personnes voulant améliorer sa maitrise des outils d’analyse de séquence dans le contexte du laboratoire : technicien, ingénieur, doctorant, postdoc, chercheur. Il n’est pas demandé de prérequis en informatique, la totalité des travaux pratiques se fera à l’aide de sites internet sans utilisation d’Unix ou de lignes de commandes. La formation sera donnée en français.

1er jour :

  • Matin : Introduction générale et Organisation des banques de données publiques
  • Après-midi : Recherche d’information dans les banques de données publiques
2ème jour :
  • Matin : Comparaison et alignement de deux séquences (Cours)
  • Après-midi : Découverte de logiciels d’alignement de séquences : dotplot, water, needle, diffseq, lastweb (TP)
3ème jour :
  • Matin : Recherche de séquences similaires dans les banques de données et utilisation de Blast (Cours + TP)
  • Après-midi : Comparaison et alignement multiple de séquences avec clustalO, needle, t-coffee … (Cours + TP)
4ème jour :
  • Matin : (Cours + TP)
  • o Recherche et extraction de motifs (pattern/profils/HMM/logo)
    o Découverte de familles de protéines et domaines fonctionnels dans la banque prosite.
  • Après-midi : Introduction à l’annotation de génome (Cours + TP)
5ème jour :
  • Introduction à la phylogénie (Cours + TP)



Software Engineer – 8 months contract – Institut Pasteur

EVENT : C3BI Available position

Software Engineer – 8 months contract – Institut Pasteur


We are hiring a software engineer for a 8 months period. This person will contribute to two different software development projects:

  • software development to enable the support of CWL [1] tools and workflows on the Galaxy platform
  • development of the next version of MacSyFinder [2], a software to model and search for macromolecular systems in bacterial genomes.
  • Masters degree in computer science, software engineering, or bioinformatics,
  • Good knowledge of the Python programming language and Object Oriented Programming,
  • Knowledge of data serialization formats such as XML, JSON and/or YAML,
  • Working knowledge of HTTP APIs (development and usage),
  • Familiarity with bioinformatics tools and data,
  • Familiarity with integrated environments such as Galaxy (
  • Familiarity with concurrent versioning system (git) would be a plus
The successful candidate will participate in two different projects:
  • a collaboration to unify the annotation of eukaryotic transcriptomes from MMETSP and Tara Oceans Project through interoperable workflows. The goal of this project will be to work on enabling the execution of CWL workflows in Galaxy, in close collaboration with international collaborators from ELIXIR and the Galaxy community.
  • the MacSyFinder project: MacSyFinder is an open source program to model and detect macromolecular systems and molecular pathways from protein sequence datasets ( MacsyFinder has now been used by bioinformaticians for several years and will be updated:
  • to ensure its technical sustainability
  • to make it more intuitive of use by biologists
  • to add new features

The new version of MacSyFinder is developed in collaboration by two research laboratories: the Microbial Evolutionary Genomics group at the Pasteur Institute ( and the GEM group at the TIMC-IMAG department (

On the first project, the main mission is to contribute to the development of the Galaxy platform to enable the direct execution of CWL tools and workflows. On the second project, the mission of the candidate will be to participate to the design and implementation of the new MacSyfinder version, through the implementation of new features, the creation of unit tests, the migration from python 2 to python 3, etc.

He/she will be affiliated to the “Bioinformatics Hub” and the “Microbial Evolutionary Genomics” research laboratory of the C3BI at the Institut Pasteur, and will work with the same core team of engineers that also works on both projects.

Please send your application (resume, supporting statement) by email to, and

[1] Amstutz P, Crusoe M R, Tijanić N, Chapman B, Chilton J, Heuer M, Kartashov A, Leehr D, Ménager H, Nedeljkovich M, Scales M, Soiland-Reyes S, Stojanovic L. Common Workflow Language v1.0. Figshare. 2016
[2] Abby, S. S., Neron, B., Menager, H., Touchon, M. & Rocha, E. P. MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. PLoS ONE 9, e110726 (2014).

Hierarchical functional genomics to interpret genome variation and dissect complex disease architecture



Main speaker : Emmanouil Dermitzakis, from Functional Population Genomics and Genetics of Complex Traits Laboratory (FunPopGen)
Department of Genetic Medicine and Development / Director: Health 2030 Genome Center Universite de Geneve

Date : 05-04-2018 at 02:00 pm
Location : Auditorium Francois Jacob – BIME (26), Institut Pasteur, Paris

Molecular phenotypes inform us about genetic and environmental effects on cellular and tissue state. The elucidation of the genetic basis of gene expression and other cellular phenotypes is highly informative for the impact of genetic variants in the cell and the subsequent consequences in the organism. In this talk I will discuss recent advances in key areas of the analysis and integration of the genomics of gene expression, chromatin and cellular phenotypes in human populations and multiple tissues from various cohorts including the GTEx consortium and how this assists in the interpretation of regulatory networks and human disease variants. I will also discuss how these recent advances are informing us about the impact of regulatory variation in cancer.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Learning tumor phylogenies from single-cell data

EVENT : C3BI Seminars

Learning tumor phylogenies from single-cell data

Main speaker : Niko Beerenwinkel, from Department of Biosystems science and engineering, ETH Zurich. Professor of Computational Biology. Date : 19-04-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Cancer progression is an evolutionary process characterized by the accumulation of mutations and responsible for tumor growth, clinical progression, and drug resistance development. We discuss how to reconstruct the evolutionary history of a tumor from single-cell sequencing data. The tumor phylogeny problem is challenging because of sequencing errors and the high rate of allelic drop-out in single cell DNA sequencing experiments. We present a probabilistic model and a Markov Chain Monte Carlo approach to learn tumor phylogenies from such data. We use the model to develop a statistical test of the infinite sites assumption, which is frequently made in cancer evolution. We find that the infinite sites assumption is often violated by back mutations and sometimes also by parallel mutations.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Unit Seminar – Lucas Husquin & Jakob Ruess


EVENT : C3BI Unit Seminars

Main speaker : Lucas Husquin, from Human evolutionary genetics unit Date : 15-02-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Lucas Husquin (Human evolutionary genetics unit) : “Dissecting the impact of population variation in DNA methylation on transcriptional responses to immune activation”


Jakob Ruess (InBio : Experimental and computational methods for modeling cellular processes) : “Virtual reality for bacteria”

Some lessons learned during 30 years of biocuration activities

EVENT : C3BI Seminars

Some lessons learned during 30 years of biocuration activities

Main speaker : Pr Amos Bairoch, Computer Analysis and Laboratory Investigation of Proteins of Human Origin (CALIPHO),

University of Geneva Medical School

Date : 08-03-2018 at 02:00 pm

Location : Auditorium Francois Jacob – BIME (26), Institut Pasteur, Paris

Amos Bairoch is a Professor of Bioinformatics at the University of Geneva. He was one of the founders of the Swiss Institute of Bioinformatics (SIB), where he leads the CALIPHO group, which relies on bioinformatics, curation, and experimental efforts to functionally characterize human proteins.

His main work is in the field of protein sequence analysis where he contributed to the development of major databases such as the Swiss-Prot database, which later gave birth to UniProt, thus becoming the world’s most comprehensive catalogue of information on proteins. He also created PROSITE database of protein families and domains. As a leader of the CALIPHO group (together with Lydie Lane), he is now involved in the development of neXtProt database which extends on the UniProt database to include data from a variety of high-throughput approaches (such as micro-array, antibodies screens, proteomics, interactomics, structural genomics), and the Cellosaurus, a knowledge resource on cell lines. Amos Bairoch has received many highly prestigious awards such as the European Latsis Prize (2004), the Otto Naegeli prize (2010), and the proteomics pioneer award (2013).

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Inferring causality in complex molecular pathways from live cell movies

EVENT : C3BI Seminars

Inferring causAlity in complex molecular pathways from live cell movies 

Main speaker : Gaudenz Danuser, from Patrick E. Haggerty Distinguished Chair in Basic Biomedical Science Bioinformatics, Cell Biology UT Southwestern, Dallas, USA Date : 22-02-2018 at 02:00 pm Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris

One of the major limitations in the study of complex molecular pathways is adaptation of the system to experimental perturbation. With ‘complex’ we mean pathways with a significant level of functional redundancy between components and nonlinear interactions. In this scenario, perturbation of one component may lead to an observable phenotype. However, it is impossible to interpret the difference between phenotype and wildtype in terms of the function the targeted component fulfills in the unperturbed system – although this is the predominant approach cell and systems biologists take to dissect molecular functions. Over the past decade my lab has made efforts to circumvent this problem by exploiting the basal fluctuations of molecular activities observed in live cell movies to establish causal functional relations between pathway components. Inspired by the accomplishments of econometrics, where predictive models are built entirely from passive observation of financial fluctuation time series, we have developed a computational framework to determine nonlinear interdependencies between pathway components. This presentation will introduce some of the mathematical, computational, and experimental concepts based on which we can now accurately delineate the functional hierarchy between signaling and mechanical processes that control cellular morphogenesis, which is a prime example of a complex molecular process.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting