Genomic enzymology web tools for functional assignment: Generating and analyzing Sequence Similarity Networks (SSNs) and Genome Neighborhood Networks (GNNs) with the EFI suite

EVENT : C3BI Seminars

Main speaker : Rémy Zallot, from University of Illinois Date : 20-06-2019 at 02:00 pm Location : Duclaux room down groundfloor – DUCLAUX (01), Institut Pasteur, Paris

Protein databases contain an exponentially growing number of sequences as a result of the decrease of cost and difficulty of genome sequencing. The rate of data accumulation far exceeds the rate of functional studies, producing an increase in genomic ‘dark matter’, sequences for which no precise and validated function is defined. Strategies to leverage the protein and genome databases for discovery of the functions of novel enzymes belonging to the dark matter are needed. “Genomic enzymology” is the integration of relationships among sequence-function space in protein families and the genome context of their bacterial, archaeal, and fungal members to propose function. The Enzyme Function Initiative suite of webtools ( include the EFI-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks (SSNs) for protein families and the EFI-Genome Neighborhood Tool (EFI-GNT) producing Genome Neighborhood Networks (GNNs) and Genome Neighborhood Diagrams (GND) for analyzing and visualizing genome context of SSNs clusters. Together, these tools facilitate the “Genomic enzymology” application to the ‘dark matter’ problem. A detailed overview of the principle of SSNs, GNNs and GNDs generation will be presented. The identification of an unexpected reaction in the Queuosine biosynthesis pathway will illustrate the approach.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Bayesian matrix factorization for drug discovery and precision medicine

EVENT : C3BI Seminars

Main speaker : Yves Moreau, from Center for Computational Systems Biology, KU Leuven Date : 31-01-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Matrix factorization/completion methods provide an attractive framework to handle sparsely observed data, also called “scarce” data. A typical setting for scarce data are is clinical diagnosis in a real-world setting. Not all possible symptoms (phenotype/biomarker/etc.) will have been checked for every patient. Deciding which symptom to check based on the already available information is at the heart of the diagnostic process. If genetic information about the patient is also available, it can serve as side information (covariates) to predict symptoms (phenotypes) for this patient. While a classification/regression setting is appropriate for this problem, it will typically ignore the dependencies between different tasks (i.e., symptoms). We have recently focused on a problem sharing many similarities with the diagnostic task: the prediction of biological activity of chemical compounds against drug targets, where only 0.1% to 1% of all compound-target pairs are measured. Matrix factorization searches for latent representations of compounds and targets that allow an optimal reconstruction of the observed measurements. These methods can be further combined with linear regression models to create multitask prediction models. In our case, fingerprints of chemical compounds are used as “side information” to predict target activity. By contrast with classical Quantitative Structure-Activity Relationship (QSAR) models, matrix factorization with side information naturally accommodates the multitask character of compound-target activity prediction. This methodology can be further extended to a fully Bayesian setting to handle uncertainty optimally, and our reformulation allows scaling up this MCMC scheme to millions of compounds, thousands of targets, and tens of millions of measurements, as demonstrated on a large industrial data set from a pharmaceutical company. We also show applications of this methodology to the prioritization of candidate disease genes and to the modeling of longitudinal patient trajectories. We have implemented our method as an open source Python/C++ library, called Macau, which can be applied to many modeling tasks, well beyond our original pharmaceutical setting.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training

Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 17-06-2019 (08:00am) – 21-06-19
Location : Yersin Training room (24) ,Institut Pasteur, Paris

Students will need to bring their laptop.

More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (

An evolutionary perspective on meiotic recombination in vertebrates

EVENT : C3BI Seminars

Main speaker : Molly Przeworksi, from College de France – Columbia University Date : 20-12-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Meiotic recombination is a fundamental genetic process that generates new combinations of alleles on which natural selection can act and ensures the proper alignment and segregation of chromosomes. Recombination events are initiated by double strand breaks deliberately inflicted on the genome during meiosis. As I will discuss, in vertebrates, there appear to be two main mechanisms by which the locations of these double strand breaks are specified: through binding of the gene PRDM9 or by localization to promoter-like features of the genome. I will present our recent work linking these two mechanisms to dramatic differences in the evolutionary dynamics of recombination hotspots, and draw out potential implications for hybridization between closely related species.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Introduction to data analysis 2018-19

EVENT : C3BI Training

Main speaker : C3BI Team Autumn session: Date : 19-10-2018 at 09:00 am Location : Retrovirus room – LWOFF (14), Institut Pasteur, Paris

Winter session: Date : 11-01-2019 at 09:00 am Location : BFJ 28-01-01A, Institut Pasteur, Paris

This course is addressed to first-year Ph.D. students from the Institut Pasteur: registration is systematic upon joining the institute. Depending on availability, second- and third-year Ph.D. students and postdocs may also apply. First-year PhD students with a background in mathematics or physics will be allowed to ask for an exemption.

The course will mix closely theory and practice. It will last four weeks, four days a week with a three-hours lecture per day. We organize two sessions, the first one starting October 19th, 2018 and the second one starting January 11th, 2019. Each session will start by an Introduction to Computer Science to ensure that all students are familiar with essential computer science notions such as computer architecture, file system organization, file format and programming languages. Following the statistics classes, an optional introduction to Image Analysis and Processing will be proposed by the Image Analysis Hub (2 lectures).
Introduction to Computer Science module : This one-lecture module will provide students with essential computer science notions such as computer architecture, file system organization, file format and programming languages. At the end of this lecture, there will be time left for questions regarding the needed configuration of students’ personal laptops for the Data and Image Analysis modules.   Data analysis module : The course covers a broad range of concepts that are needed for experiment design, data exploration and analysis, interpreting results and generating figures for publications. It will provide fundamental knowledge in statistics, including uni- and multi-variate descriptive analyses, usual probability distributions and their application in biology, estimation, sampling and hypothesis testing. R and RStudio will be used for practice. Students are expected to install these tools before the beginning of the course: Installation instructions are provided in the first part of the R course material.   Introduction to Image analysis module : The two-lectures optional image analysis module will introduce the basic principles of image analysis, or how to extract quantitative information from microscopy images. The course is designed for people who have no or very little experience in the field. It will be oriented towards practical use, and short lectures will be followed by hands-on sessions and tutorials. It should help experienced microscopists and beginners who have never had any formal training in image quantification.

The detailed program of each session is online fall 2018, winter 2019

In order to follow the course all students need to bring a laptop and install R on it. Please check that your computer meets the minimum requirements listed below.
  • PC – Windows based : Intel i3 / Windows 7 / 4Go RAM / 256 Go HD
  • Apple Macintosh : mid-2010 mac book / OSX 10.10 / 4Go RAM / 256 Go HD
  • PC – Linux based : Intel i3 / Any distribution (supporting R >= 3.5.1, if possible) / 4Go RAM / 256 Go HD
Instructions to install R are provided at the beginning of the R course material. The week before the course, students are invited to get their laptop checked by the C3BI teaching team if necessary.


The form below has to be filled out either to request an exemption or to apply to the course.

  • Exemptions will be delivered to students already trained in biostatistics (join a CV and a letter from the supervisor).
  • PhD students in 2nd, 3rd years , as well as postdocs working at Pasteur Paris may also apply.

An 18-month post-doctoral position is available in the “Chemoinformatics and Proteochemometrics”

EVENT : C3BI Available position

Contact : Olivier Sperandio Date : 18-09-2018  Location : Institut Pasteur, Paris

An 18-month post-doctoral position is available in the “Chemoinformatics and Proteochemometrics” group (Dr O. Sperandio) of the Structural Bioinformatics unit (Pr M. Nilges) within the Structural Biology and Chemistry department, available immediately. Research project: Molecular modeling and protein-protein docking to characterize key molecular mechanisms that underlay the pathophysiology of osteoporosis. The position is offered in the framework of the ANR-funded Targetbone collaborative project that brings together the complementary expertise of the groups of Professor Martine Cohen-Solal (Hôpital Lariboisière, project coordinator), Professor Giovanni Levi (Museum National d’Histoire Naturelle) and of the “Chemoinformatics and Proteochemometrics” group of Dr Olivier Sperandio at Institut Pasteur. The overall goal of the project is to provide an integrated understanding of the cellular and molecular mechanisms that underlay the pathophysiology of osteoporosis focusing on the differentiation process of Bone Marrow Mesenchymal Stem Cells (BM-MSC) and bone marrow progenitors towards the osteoblastic lineage. Key transcription factors, playing an important role in osteogenesis, are expressed by BM-MSC and are upstream regulators of master genes involved in the induction of osteoblast differentiation. The general aim of the project is to characterize the cellular and molecular factors that promote BM-MSCs differentiation modifying directly the function of transcription factors in BM-MSCs or in more differentiated progenitors in vivo and in vitro. The contribution of our group to this project is to use molecular modeling and protein-protein docking to characterize the molecular interactions that those key transcription factors have with their known partners to promote BM-MSCs differentiation at the molecular level. A tight collaboration is ongoing with the Pole Protein of Institut Pasteur for this project. This will bring precious crystal structures to validate the modeling approach with one or several generated structures. The expected results are the functional and structural characterization of the interactions that those transcription factors make with some of their key partners in the context of osteoporosis. This opens new perspectives to identify druggable binding cavities, which will pave the way for future drug design projects. Who are we looking for: The candidate must have a strong background in structural bioinformatics, homology modeling and protein-protein docking, ideally using the techniques based on evolutionary information. The candidate should be familiar with the concept of druggable pockets and the various software that can profile them. The candidate must be highly motivated, have good communication skills in english, and be willing and able to work with a team-spirit in a highly interactive research consortium. What are we offering: Funding for 18 months, with the possibility to extend the contract by applying to further funding. The possibility to be involved in other protein-protein docking projects, a highly-demanded topic on the Pasteur campus. A fruitful and highly cooperative environment with the rest of the department, the structural bioinformatics unit, and the bioinformatics center (C3BI) which contain numerous talented structural biologists and bioinformaticians. Salary will be commensurate with experience according to the Institut Pasteur guidelines. A first contact is usually established through a Skype interview, followed by an invitation to give an informal 30 minute talk to the team at the Institut Pasteur, and half a day discussing with the members of the lab. A decision to hire is then taken after discussion with the team. Qualified applicants should send their CV, a statement of research interests and two letters of recommendation to

Hands-on microbiome data analysis: tools for understanding microbial communities in health and disease

EVENT : C3BI Training

Main speaker : Gregorio Iraola, from Institut Pasteur de Montevideo Date : 03-12-2018 at 09:00 am Location : Institut Pasteur de Montevideo

This course aims to provide the theoretical and practical concepts for standard bioinformatic analysis in the field of microbiome research. The course will focus on the application of state-of-the-art software tools for the analysis of environmental and host-associated microbiomes, with particular emphasis on understanding how they change or constitute a risk for human health. The course will have expert lectures and theoretical/practical data analysis sessions with real datasets.


STUDENT’S PRE-REQUISITES • Directed to post-graduation (M.Sc. or Ph.D.) students. • Basic concepts of high-throughput sequencing technologies. • Basic understanding of metagenomics and microbial ecology. • Basic skills in the Linux terminal.



Institut Pasteur Montevideo

  • Chair: Gregorio Iraola
  • Pablo Fresia
  • Daniela Costa
  • Cecilia Salazar
  • Verónica Antelo
  • Ignacio Ferrés
  • Matias Giménez
Institut Pasteur Paris
  • Marie Lopez
  • Amine Ghozlane
  • Angèle Benard
      • Gianfranco Grompone, Discovery Microbiome, Nutrition & Health Science Lead, Lesaffre, France.
      • David Danko, Director of Bioinformatics, MetaSUB International Consortium, Weill Cornell Medicine, US

      DEADLINE APPLICATIONS October 19, 2018. Send your CV (one page) and letter of motivation to:


Integrated and spatial-temporal multiscale modeling of liver guide in vivo experiments in healthy & chronic disease states: a blue print for systems medicine?

EVENT : C3BI Seminars

Main speaker : Dirk Drasdo, from INRIA / IZBI Joint Research Group Date : 20-09-2018 at 02:00 pm Location : Salle Retrovirus – Bâtiment LWOFF ,Institut Pasteur, Paris

Background and Aims:  Hyperammonemia after drug-induced peri-central liver lobule damage, as from overdosing acetaminophen (paracetamol), and can lead to encephalopathy and dead of the patient. Guided by mathematical models, the consensus set of chemical reactions for detoxification of liver from ammonia has recently been shown to fail in explaining ammonia-detoxification after drug-induced peri-central damage (Schliess et. al., 2014). Our aim is to demonstrate how integrated and spatial-temporal models mimicking detoxification of the blood from ammonia in virtual tissue samples can assist in guiding identification of missing molecular mechanisms, or predicting the impact of micro-architectural alterations due to acute or chronic damage on ammonia detoxification. Our modeling methodology is very general.     Method:The consensus and alternative detoxification mechanisms have been implemented within mathematical integrated and spatial-temporal multi-scale models to test various hypotheses on potentially missing mechanisms in ammonia detoxification during liver regeneration after drug-induced pericentral damage in silicoin a virtual liver lobule (Drasdo et. al., J. Hepat. 2014). The multi-scale model simulates blood flow and molecular transport in the spatial lobule micro-architecture and displays each individual hepatocyte in space and time. Detoxification reactions are executed in each virtual hepatocyte. This makes in silicotesting of hypothesized mechanisms feasible from the molecular up to the tissue scale. The results are directly compared to experiments in mouse. Finally, fibrotic streets have been added to the model to predict the possible impact of architectural distortions and micro-shunts.     Results:We demonstrate how multiscale and multilevel models guided experiments towards identification of a previously unrecognized ammonia detoxification mechanism, that has the potential of improving treatment in hyperammonemia (Ghallab et. al., J. Hepat. 2016). The same model predicts for CCl4-induced fibrosis a reduced detoxification capacity for ammonia. Finally we outline how the whole body scale can be included to arrive at a model spanning molecular up to whole body scale permitting to study the relation of molecular changes and micro-architecture on whole body blood circulation, and briefly summarize results of integration of APAP toxic pathway as HGF signaling.    

Conclusion:Refined multi-scale models increasingly permit realistic prediction of liver function as well as of toxic injury in acute and chronic damage states. Those models can integrate data from various sources, in vitro, different animal models or human data. The direct representation of liver micro-architecture in those models will open up the future perspective to feed these models with patient-specific data, hence generating a virtual twin of a patients’ liver to guide personalized diagnosis and therapy planning.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Viral phylodynamic inference: from ancient evolutionary histories to contemporary outbreaks

EVENT : C3BI Seminars

Main speaker : Philippe Lemey, from KU Leuven – Department of Microbiology and Immunology
Date : 06-09-2018 at 02:00 pm
Location : Auditorium Jaques Monod – MONOD (66) ,Institut Pasteur, Paris

The field of computational phylodynamics has witnessed a rich development of statistical inference tools with increasing levels of sophistication that can be applied to address a variety of questions about the evolution and epidemiology of viruses. The central premise of the field is that viruses generally evolve so rapidly that epidemic processes leave an imprint in their genomes. When focusing on deep phylogenies, a rich substitution history may confound time-measured evolutionary analyses, whereas for short-term outbreaks, it may be questioned whether the imprint provides the necessary resolution for insightful evolutionary reconstructions.

Here, I will illustrate these aspects on different viral examples. The Hepatitis B virus represents an example of a deep evolutionary history that has been difficult to date accurately using sequences sampled over the last decades. Recently, ancient DNA work has resulted in the first HBV samples dating back thousands of years. Using molecular clock modeling that accommodates time-dependent evolutionary rates, I will show how recent rapid evolutionary rate estimates can be reconciled with the long-term evolutionary dynamics of the virus.

The 2013-2016 West African Ebola epidemic marked the start of real-time genomic sequencing. Using this example, I will illustrate that short-term outbreak dynamics can be investigated using viral genome sequences, but integrating various sources of information with genomic data promises to deliver more precise insights in infectious diseases. Finally, using recent work on Lassa virus in West Africa, I will further highlight how in-field, real-time molecular epidemiology may impact outbreak responses.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Signatures of ecological processes in microbial community time series

EVENT : C3BI Seminars

Main speaker : Karoline Faust, from KU Leuven Date : 04-10-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Nowadays, a number of densely sampled microbial community time series is available, where the abundance of community members is tracked over several months through sequencing. These data allow exploring community dynamics by investigating signatures of underlying ecological processes that are present in the community time series. In this seminar, I will present our work on the exploitation of time series properties to distinguish between different ecological processes behind the observed dynamics

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting