Signatures of ecological processes in microbial community time series

EVENT : C3BI Seminars

Main speaker : Karoline Faust, from KU Leuven Date : 04-10-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Nowadays, a number of densely sampled microbial community time series is available, where the abundance of community members is tracked over several months through sequencing. These data allow exploring community dynamics by investigating signatures of underlying ecological processes that are present in the community time series. In this seminar, I will present our work on the exploitation of time series properties to distinguish between different ecological processes behind the observed dynamics

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Profiling epitranscriptomic RNA modifications by Next-Generation Sequencing

EVENT : C3BI Seminars

Main speaker : Yuri Motorin, from Ingénierie Moléculaire et Physiopathologie Articulaire (IMoPA), Université de Lorraine, Nancy Date : 14-06-2018 at 11:00 am Location : Auditorium Jaques Monod – MONOD (66) ,Institut Pasteur, Paris

RNA modifications are emerging players in the field of posttranscriptional regulation of gene expression, and are attracting a comparable degree of research interest to DNA and histone modifications in the field of epigenetics. The true potential of only a handful out of more than 100 RNA modifications is currently emerging as the consequence of a leap in detection technology, principally associated with high-throughput sequencing. In the seminar I will outline the major developments in this field with thorougful discussion of detection principles, advantages and drawbacks of new high-throughput approaches, with particular focus on 2′-O-methylations in rRNA and tRNA.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Nucleotide-level analysis of genetic variation in the bacterial pan-genome

EVENT : C3BI Seminars

Main speaker : Zamin Iqbal, from Royal Society/Wellcome Trust Sir Henry Dale Fellow, EMBL-EBI Date : 07-03-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

When we study evolution of a species, we use different models, depending on what we want to achieve or infer. We might restrict to SNP variation in the “core genome”  (presumably inherited vertically) to study phylogeography or to study an outbreak. In reducing the problem to the analysis of SNPs (and invariant sites), it has been possible for researchers to build a range of sophisticated phylogenetic models. However once we try to incorporate genome organisation, chromosomal rearrangements, movement of plasmids, transposons or phage, then the modelling problem is far harder. The question of how to  properly model bacterial genetic variation is wide open and extremely challenging.
A prerequisite for any solution to this, is a decision on how to describe the variation in the first place – you cannot model variation until you represent it. Note that this is true even if you have perfect genome assemblies: even if it were possible to multiple sequence align them, this would not really help with how to notice that a SNP at one position in one genome is “the same” as a SNP somewhere else in another.
In this talk,  I want to discuss a solution we have been developing to this representation problem. We show how it is possible to represent the pan genome of a species as a network of “floating” graphs, representing the ensemble of known variation in  pathology blocks (we use genes and intergenic regions, but this could be done for mobile elements also). In doing so it becomes possible to discover and describe genetic variation at fine (SNP/indel) and coarse (gene order) level.
This is a major research theme for my group and I describe progress to date, including results on both illumina and nanopore data.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training

Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 25-06-2018 at 08:00 am
Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris

Students will need to bring their laptop.

More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (

Target audience: PhD students in biological sciences with a strong focus on microbiology/biochemical applications.

Instructor: Prof. Valérie de Crécy-Lagard is an expert in comparative genomics. She has been using comparative genomic methods to link gene and function for over twenty years and has developed curriculum the teach integrative data mining tools at all levels.

Module organization

The training aims at enabling researchers and students to master an array of web-based tools to help to predict gene function. This will allow them to generate in silico based functional predictions and produce illustration for manuscripts that use comparative genomic methods.

The course will last 5 days and uses a blend of lectures and hands-on application

• Module 1: Basic bioinformatics tools day 1, morning). This module is to bring everyone up to date on the basic tools that will be routinely used in the course. These will include data extraction from major biological databases such as NCBI and Uniprot, Blast, multiple alignments, accessing precomputed phylogenetic trees and genome browsers.

• Module 2: Linking genes to pathway and pathway to genes (day 1, afternoon):.This module will focus on pathway databases, metabolic reconstruction and models and how mapping a gene to a pathway or more generally to biological system can ground truth a functional annotation.

• Module 3: Non homology based association methods (day 2, morning). Physical clustering, phylogenetic distribution, comparing whole genomes, iTOL visualization tools

• Module 4: Paralogs a blessing and a curse (day 2, afternoon). This module will focus on the tools and strategy to disambiguate paralog families. This includes, Basic phylogenetic tree building, paralog separation tools, building and comparing logos

• Module 5: Regulatory based associations. (day 3, morning). This module focuses on identifying regulatory sites, predicting regulatory networks, mining transcriptome data, and generating heatmaps and Venn diagrams.

• Module 6: Beyond transcriptomics, mining other types of high throughput experimental data (day 3, afternoon).. This module focuses on mining other types of experimental data Phenotype/fitness, protein interaction and complexes, localization, metabolomics.

• Module 7: Putting it together in the MicroScope platform (day 4, all day). This module proposed by the CEA will cover some of the techniques discussed above using Microscope :

• Module 8: Putting it together with student examples (day 5, all day).). The last day of class students will have the opportunity to work on a protein family of their choice with the help of the instructors.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Network Medicine: From Cellular Networks to the Human Diseasome

EVENT : C3BI Seminars

Main speaker : Albert-Lazlo Barabasi, from Center of Complex Networks Research, Northeastern University and Division of Network Medicine, Harvard University Date : 12-06-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular network. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships between apparently distinct (patho) phenotypes. Advances in this direction are essential to identify new disease genes, to uncover the biological significance of disease-associated mutations identified by genome-wide association studies and full genome sequencing, and to identify drug targets and biomarkers for complex diseases.

Albert-László Barabasi is a Romanian-born Hungarian-American physicist, best known for his work in the research of network theory. He is the former Emil T. Hofmann Professor at the University of Notre Dame and current Distinguished Professor and Director of Northeastern University’s Center for Complex Network Research (CCNR) associate member of the Center of Cancer Systems Biology (CCSB) at the Dana–Farber Cancer Institute, Harvard University, and visiting professor at the Center for Network Science at Central European University. He introduced in 1999 the concept of scale-free networks and proposed the Barabási–Albert model to explain their widespread emergence in natural, technological and social systems, from the cellular telephone to the World Wide Web or online communities. He is the Founding President of the Network Science Society, which grew out of and sponsors the flagship NetSci conference held yearly since 2006. (source Wikipedia)

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Unit Seminar – Alexander Serov & Antoine Frenoy


EVENT : C3BI Unit Seminars

Main speaker : Alexander Serov, from Decision and bayesian computation group Date : 15-03-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Alexander Serov (Decision and bayesian computation group) : “Robust Conservative Force Detection with the Overdamped Langevin Equation”


Antoine Frenoy (Microbial Evolutionary Genomics group) : “Death and population dynamics affect mutation rate estimates and evolvability under stress in bacteria”

Analyse de séquences

Main speaker : Corinne Maufrais, from C3BI
Date : 19-03-2018 at 09:30 am
Location : Module 3 – SOCIAL BUILDING (06) ,Institut Pasteur, Paris

Les biologistes sont régulièrement confrontés à des gènes (ou des protéines) de fonctions inconnues ou mal annotés. Dans ce contexte, maîtriser quelques techniques basiques d’analyse de séquences peut se révéler d’une aide précieuse.

L’objectif de cette formation est de présenter, au travers de l’utilisation de sites web spécialisés, quelques grands principes sur l’analyse de séquence. L’ensemble de la formation combine exposés théoriques (fondements méthodologiques des programmes) et applications pratiques (mise en relation des notions théoriques avec les paramètres des programmes et les résultats obtenus) pour permettre une utilisation autonome et critique de quelques logiciels d’analyse des séquences biologiques.

Les cours auront lieu du lundi 19 mars au vendredi 23 mars de 9h30 à 12h30 et de 14h à 17h MODULE 4/ Salle 3 sous la cantine.

Répondre à cette annonce avant le 9 Mars 2018 en envoyant un email à

Ce cours s’adresse à toutes personnes voulant améliorer sa maitrise des outils d’analyse de séquence dans le contexte du laboratoire : technicien, ingénieur, doctorant, postdoc, chercheur. Il n’est pas demandé de prérequis en informatique, la totalité des travaux pratiques se fera à l’aide de sites internet sans utilisation d’Unix ou de lignes de commandes. La formation sera donnée en français.

1er jour :

  • Matin : Introduction générale et Organisation des banques de données publiques
  • Après-midi : Recherche d’information dans les banques de données publiques
2ème jour :
  • Matin : Comparaison et alignement de deux séquences (Cours)
  • Après-midi : Découverte de logiciels d’alignement de séquences : dotplot, water, needle, diffseq, lastweb (TP)
3ème jour :
  • Matin : Recherche de séquences similaires dans les banques de données et utilisation de Blast (Cours + TP)
  • Après-midi : Comparaison et alignement multiple de séquences avec clustalO, needle, t-coffee … (Cours + TP)
4ème jour :
  • Matin : (Cours + TP)
  • o Recherche et extraction de motifs (pattern/profils/HMM/logo)
    o Découverte de familles de protéines et domaines fonctionnels dans la banque prosite.
  • Après-midi : Introduction à l’annotation de génome (Cours + TP)
5ème jour :
  • Introduction à la phylogénie (Cours + TP)



Software Engineer – 8 months contract – Institut Pasteur

EVENT : C3BI Available position

Software Engineer – 8 months contract – Institut Pasteur


We are hiring a software engineer for a 8 months period. This person will contribute to two different software development projects:

  • software development to enable the support of CWL [1] tools and workflows on the Galaxy platform
  • development of the next version of MacSyFinder [2], a software to model and search for macromolecular systems in bacterial genomes.
  • Masters degree in computer science, software engineering, or bioinformatics,
  • Good knowledge of the Python programming language and Object Oriented Programming,
  • Knowledge of data serialization formats such as XML, JSON and/or YAML,
  • Working knowledge of HTTP APIs (development and usage),
  • Familiarity with bioinformatics tools and data,
  • Familiarity with integrated environments such as Galaxy (
  • Familiarity with concurrent versioning system (git) would be a plus
The successful candidate will participate in two different projects:
  • a collaboration to unify the annotation of eukaryotic transcriptomes from MMETSP and Tara Oceans Project through interoperable workflows. The goal of this project will be to work on enabling the execution of CWL workflows in Galaxy, in close collaboration with international collaborators from ELIXIR and the Galaxy community.
  • the MacSyFinder project: MacSyFinder is an open source program to model and detect macromolecular systems and molecular pathways from protein sequence datasets ( MacsyFinder has now been used by bioinformaticians for several years and will be updated:
  • to ensure its technical sustainability
  • to make it more intuitive of use by biologists
  • to add new features

The new version of MacSyFinder is developed in collaboration by two research laboratories: the Microbial Evolutionary Genomics group at the Pasteur Institute ( and the GEM group at the TIMC-IMAG department (

On the first project, the main mission is to contribute to the development of the Galaxy platform to enable the direct execution of CWL tools and workflows. On the second project, the mission of the candidate will be to participate to the design and implementation of the new MacSyfinder version, through the implementation of new features, the creation of unit tests, the migration from python 2 to python 3, etc.

He/she will be affiliated to the “Bioinformatics Hub” and the “Microbial Evolutionary Genomics” research laboratory of the C3BI at the Institut Pasteur, and will work with the same core team of engineers that also works on both projects.

Please send your application (resume, supporting statement) by email to, and

[1] Amstutz P, Crusoe M R, Tijanić N, Chapman B, Chilton J, Heuer M, Kartashov A, Leehr D, Ménager H, Nedeljkovich M, Scales M, Soiland-Reyes S, Stojanovic L. Common Workflow Language v1.0. Figshare. 2016
[2] Abby, S. S., Neron, B., Menager, H., Touchon, M. & Rocha, E. P. MacSyFinder: A Program to Mine Genomes for Molecular Systems with an Application to CRISPR-Cas Systems. PLoS ONE 9, e110726 (2014).

Hierarchical functional genomics to interpret genome variation and dissect complex disease architecture



Main speaker : Emmanouil Dermitzakis, from Functional Population Genomics and Genetics of Complex Traits Laboratory (FunPopGen)
Department of Genetic Medicine and Development / Director: Health 2030 Genome Center Universite de Geneve

Date : 05-04-2018 at 02:00 pm
Location : Auditorium Francois Jacob – BIME (26), Institut Pasteur, Paris

Molecular phenotypes inform us about genetic and environmental effects on cellular and tissue state. The elucidation of the genetic basis of gene expression and other cellular phenotypes is highly informative for the impact of genetic variants in the cell and the subsequent consequences in the organism. In this talk I will discuss recent advances in key areas of the analysis and integration of the genomics of gene expression, chromatin and cellular phenotypes in human populations and multiple tissues from various cohorts including the GTEx consortium and how this assists in the interpretation of regulatory networks and human disease variants. I will also discuss how these recent advances are informing us about the impact of regulatory variation in cancer.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Learning tumor phylogenies from single-cell data

EVENT : C3BI Seminars

Learning tumor phylogenies from single-cell data

Main speaker : Niko Beerenwinkel, from Department of Biosystems science and engineering, ETH Zurich. Professor of Computational Biology. Date : 19-04-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Cancer progression is an evolutionary process characterized by the accumulation of mutations and responsible for tumor growth, clinical progression, and drug resistance development. We discuss how to reconstruct the evolutionary history of a tumor from single-cell sequencing data. The tumor phylogeny problem is challenging because of sequencing errors and the high rate of allelic drop-out in single cell DNA sequencing experiments. We present a probabilistic model and a Markov Chain Monte Carlo approach to learn tumor phylogenies from such data. We use the model to develop a statistical test of the infinite sites assumption, which is frequently made in cancer evolution. We find that the infinite sites assumption is often violated by back mutations and sometimes also by parallel mutations.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting