Extendable benchmarks and interactive exploratory analysis of single-cell RNA-seq data

EVENT : C3BI Seminars

Main speaker : Charlotte Soneson, from Friedrich Miescher Institute for Biomedical Research, Basel CH Date : 07-11-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

As single-cell RNA-seq is becoming increasingly widely used, the amount and variety of public data as well as the number of computational methods available for the analysis grow quickly. Unbiased benchmarking studies are vital in order to guide users and identify strengths and weaknesses of published methods. In a rapidly changing field, it is also important that benchmarks can easily be extended to include new methods, or variants of existing methods, as they become available. In this talk, I will describe several recent studies evaluating computational methods for clustering and differential expression analysis of single-cell RNA-seq data, and specifically discuss approaches to simplify exploration of the results and inclusion of new methods and data sets. I will also present the interactive SummarizedExperiment Explorer (iSEE) R/Bioconductor package, which allows straightforward, interactive exploratory analysis of single-cell RNA-seq as well as many other types of omics data.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

CANCELLED : Computational Biology in the Crossroad of Big Data, Artificial Intelligence and High Performance Computing

EVENT : C3BI Seminars

Main speaker : Alfonso Valencia, from Barcelona Super Computing Center (BSC-CNS) Date : 26-09-2019 at 02:00 pm

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Genomic enzymology web tools for functional assignment: Generating and analyzing Sequence Similarity Networks (SSNs) and Genome Neighborhood Networks (GNNs) with the EFI suite

EVENT : C3BI Seminars

Main speaker : Rémy Zallot, from University of Illinois Date : 20-06-2019 at 02:00 pm Location : Duclaux room down groundfloor – DUCLAUX (01), Institut Pasteur, Paris

Protein databases contain an exponentially growing number of sequences as a result of the decrease of cost and difficulty of genome sequencing. The rate of data accumulation far exceeds the rate of functional studies, producing an increase in genomic ‘dark matter’, sequences for which no precise and validated function is defined. Strategies to leverage the protein and genome databases for discovery of the functions of novel enzymes belonging to the dark matter are needed. “Genomic enzymology” is the integration of relationships among sequence-function space in protein families and the genome context of their bacterial, archaeal, and fungal members to propose function. The Enzyme Function Initiative suite of webtools (https://efi.igb.illinois.edu) include the EFI-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks (SSNs) for protein families and the EFI-Genome Neighborhood Tool (EFI-GNT) producing Genome Neighborhood Networks (GNNs) and Genome Neighborhood Diagrams (GND) for analyzing and visualizing genome context of SSNs clusters. Together, these tools facilitate the “Genomic enzymology” application to the ‘dark matter’ problem. A detailed overview of the principle of SSNs, GNNs and GNDs generation will be presented. The identification of an unexpected reaction in the Queuosine biosynthesis pathway will illustrate the approach.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Bayesian matrix factorization for drug discovery and precision medicine

EVENT : C3BI Seminars

Main speaker : Yves Moreau, from Center for Computational Systems Biology, KU Leuven Date : 31-01-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Matrix factorization/completion methods provide an attractive framework to handle sparsely observed data, also called “scarce” data. A typical setting for scarce data are is clinical diagnosis in a real-world setting. Not all possible symptoms (phenotype/biomarker/etc.) will have been checked for every patient. Deciding which symptom to check based on the already available information is at the heart of the diagnostic process. If genetic information about the patient is also available, it can serve as side information (covariates) to predict symptoms (phenotypes) for this patient. While a classification/regression setting is appropriate for this problem, it will typically ignore the dependencies between different tasks (i.e., symptoms). We have recently focused on a problem sharing many similarities with the diagnostic task: the prediction of biological activity of chemical compounds against drug targets, where only 0.1% to 1% of all compound-target pairs are measured. Matrix factorization searches for latent representations of compounds and targets that allow an optimal reconstruction of the observed measurements. These methods can be further combined with linear regression models to create multitask prediction models. In our case, fingerprints of chemical compounds are used as “side information” to predict target activity. By contrast with classical Quantitative Structure-Activity Relationship (QSAR) models, matrix factorization with side information naturally accommodates the multitask character of compound-target activity prediction. This methodology can be further extended to a fully Bayesian setting to handle uncertainty optimally, and our reformulation allows scaling up this MCMC scheme to millions of compounds, thousands of targets, and tens of millions of measurements, as demonstrated on a large industrial data set from a pharmaceutical company. We also show applications of this methodology to the prioritization of candidate disease genes and to the modeling of longitudinal patient trajectories. We have implemented our method as an open source Python/C++ library, called Macau, which can be applied to many modeling tasks, well beyond our original pharmaceutical setting. https://github.com/jaak-s/macau/tree/master/python/macau.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training

Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 17-06-2019 (08:00am) – 21-06-19
Location : Yersin Training room (24) ,Institut Pasteur, Paris

Students will need to bring their laptop.

More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (https://www.ncbi.nlm.nih.gov/pubmed/20001958)

An evolutionary perspective on meiotic recombination in vertebrates

EVENT : C3BI Seminars

Main speaker : Molly Przeworksi, from College de France – Columbia University Date : 20-12-2018 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris

Meiotic recombination is a fundamental genetic process that generates new combinations of alleles on which natural selection can act and ensures the proper alignment and segregation of chromosomes. Recombination events are initiated by double strand breaks deliberately inflicted on the genome during meiosis. As I will discuss, in vertebrates, there appear to be two main mechanisms by which the locations of these double strand breaks are specified: through binding of the gene PRDM9 or by localization to promoter-like features of the genome. I will present our recent work linking these two mechanisms to dramatic differences in the evolutionary dynamics of recombination hotspots, and draw out potential implications for hybridization between closely related species.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Introduction to data analysis 2018-19

EVENT : C3BI Training

Main speaker : C3BI Team Autumn session: Date : 19-10-2018 at 09:00 am Location : Retrovirus room – LWOFF (14), Institut Pasteur, Paris

Winter session: Date : 11-01-2019 at 09:00 am Location : BFJ 28-01-01A, Institut Pasteur, Paris

This course is addressed to first-year Ph.D. students from the Institut Pasteur: registration is systematic upon joining the institute. Depending on availability, second- and third-year Ph.D. students and postdocs may also apply. First-year PhD students with a background in mathematics or physics will be allowed to ask for an exemption.

The course will mix closely theory and practice. It will last four weeks, four days a week with a three-hours lecture per day. We organize two sessions, the first one starting October 19th, 2018 and the second one starting January 11th, 2019. Each session will start by an Introduction to Computer Science to ensure that all students are familiar with essential computer science notions such as computer architecture, file system organization, file format and programming languages. Following the statistics classes, an optional introduction to Image Analysis and Processing will be proposed by the Image Analysis Hub (2 lectures).
Introduction to Computer Science module : This one-lecture module will provide students with essential computer science notions such as computer architecture, file system organization, file format and programming languages. At the end of this lecture, there will be time left for questions regarding the needed configuration of students’ personal laptops for the Data and Image Analysis modules.   Data analysis module : The course covers a broad range of concepts that are needed for experiment design, data exploration and analysis, interpreting results and generating figures for publications. It will provide fundamental knowledge in statistics, including uni- and multi-variate descriptive analyses, usual probability distributions and their application in biology, estimation, sampling and hypothesis testing. R and RStudio will be used for practice. Students are expected to install these tools before the beginning of the course: Installation instructions are provided in the first part of the R course material.   Introduction to Image analysis module : The two-lectures optional image analysis module will introduce the basic principles of image analysis, or how to extract quantitative information from microscopy images. The course is designed for people who have no or very little experience in the field. It will be oriented towards practical use, and short lectures will be followed by hands-on sessions and tutorials. It should help experienced microscopists and beginners who have never had any formal training in image quantification.

The detailed program of each session is online fall 2018, winter 2019

In order to follow the course all students need to bring a laptop and install R on it. Please check that your computer meets the minimum requirements listed below.
  • PC – Windows based : Intel i3 / Windows 7 / 4Go RAM / 256 Go HD
  • Apple Macintosh : mid-2010 mac book / OSX 10.10 / 4Go RAM / 256 Go HD
  • PC – Linux based : Intel i3 / Any distribution (supporting R >= 3.5.1, if possible) / 4Go RAM / 256 Go HD
Instructions to install R are provided at the beginning of the R course material. The week before the course, students are invited to get their laptop checked by the C3BI teaching team if necessary.


The form below has to be filled out either to request an exemption or to apply to the course.

  • Exemptions will be delivered to students already trained in biostatistics (join a CV and a letter from the supervisor).
  • PhD students in 2nd, 3rd years , as well as postdocs working at Pasteur Paris may also apply.

An 18-month post-doctoral position is available in the “Chemoinformatics and Proteochemometrics”

EVENT : C3BI Available position

Contact : Olivier Sperandio Date : 18-09-2018  Location : Institut Pasteur, Paris

An 18-month post-doctoral position is available in the “Chemoinformatics and Proteochemometrics” group (Dr O. Sperandio) of the Structural Bioinformatics unit (Pr M. Nilges) within the Structural Biology and Chemistry department, available immediately. Research project: Molecular modeling and protein-protein docking to characterize key molecular mechanisms that underlay the pathophysiology of osteoporosis. The position is offered in the framework of the ANR-funded Targetbone collaborative project that brings together the complementary expertise of the groups of Professor Martine Cohen-Solal (Hôpital Lariboisière, project coordinator), Professor Giovanni Levi (Museum National d’Histoire Naturelle) and of the “Chemoinformatics and Proteochemometrics” group of Dr Olivier Sperandio at Institut Pasteur. The overall goal of the project is to provide an integrated understanding of the cellular and molecular mechanisms that underlay the pathophysiology of osteoporosis focusing on the differentiation process of Bone Marrow Mesenchymal Stem Cells (BM-MSC) and bone marrow progenitors towards the osteoblastic lineage. Key transcription factors, playing an important role in osteogenesis, are expressed by BM-MSC and are upstream regulators of master genes involved in the induction of osteoblast differentiation. The general aim of the project is to characterize the cellular and molecular factors that promote BM-MSCs differentiation modifying directly the function of transcription factors in BM-MSCs or in more differentiated progenitors in vivo and in vitro. The contribution of our group to this project is to use molecular modeling and protein-protein docking to characterize the molecular interactions that those key transcription factors have with their known partners to promote BM-MSCs differentiation at the molecular level. A tight collaboration is ongoing with the Pole Protein of Institut Pasteur for this project. This will bring precious crystal structures to validate the modeling approach with one or several generated structures. The expected results are the functional and structural characterization of the interactions that those transcription factors make with some of their key partners in the context of osteoporosis. This opens new perspectives to identify druggable binding cavities, which will pave the way for future drug design projects. Who are we looking for: The candidate must have a strong background in structural bioinformatics, homology modeling and protein-protein docking, ideally using the techniques based on evolutionary information. The candidate should be familiar with the concept of druggable pockets and the various software that can profile them. The candidate must be highly motivated, have good communication skills in english, and be willing and able to work with a team-spirit in a highly interactive research consortium. What are we offering: Funding for 18 months, with the possibility to extend the contract by applying to further funding. The possibility to be involved in other protein-protein docking projects, a highly-demanded topic on the Pasteur campus. A fruitful and highly cooperative environment with the rest of the department, the structural bioinformatics unit, and the bioinformatics center (C3BI) which contain numerous talented structural biologists and bioinformaticians. Salary will be commensurate with experience according to the Institut Pasteur guidelines. A first contact is usually established through a Skype interview, followed by an invitation to give an informal 30 minute talk to the team at the Institut Pasteur, and half a day discussing with the members of the lab. A decision to hire is then taken after discussion with the team. Qualified applicants should send their CV, a statement of research interests and two letters of recommendation to olivier.sperandio@pasteur.fr

Hands-on microbiome data analysis: tools for understanding microbial communities in health and disease

EVENT : C3BI Training

Main speaker : Gregorio Iraola, from Institut Pasteur de Montevideo Date : 03-12-2018 at 09:00 am Location : Institut Pasteur de Montevideo

This course aims to provide the theoretical and practical concepts for standard bioinformatic analysis in the field of microbiome research. The course will focus on the application of state-of-the-art software tools for the analysis of environmental and host-associated microbiomes, with particular emphasis on understanding how they change or constitute a risk for human health. The course will have expert lectures and theoretical/practical data analysis sessions with real datasets.


STUDENT’S PRE-REQUISITES • Directed to post-graduation (M.Sc. or Ph.D.) students. • Basic concepts of high-throughput sequencing technologies. • Basic understanding of metagenomics and microbial ecology. • Basic skills in the Linux terminal.



Institut Pasteur Montevideo

  • Chair: Gregorio Iraola
  • Pablo Fresia
  • Daniela Costa
  • Cecilia Salazar
  • Verónica Antelo
  • Ignacio Ferrés
  • Matias Giménez
Institut Pasteur Paris
  • Marie Lopez
  • Amine Ghozlane
  • Angèle Benard
      • Gianfranco Grompone, Discovery Microbiome, Nutrition & Health Science Lead, Lesaffre, France.
      • David Danko, Director of Bioinformatics, MetaSUB International Consortium, Weill Cornell Medicine, US

      DEADLINE APPLICATIONS October 19, 2018. Send your CV (one page) and letter of motivation to: antonio.borderia@pasteur.fr