Genomic enzymology web tools for functional assignment: Generating and analyzing Sequence Similarity Networks (SSNs) and Genome Neighborhood Networks (GNNs) with the EFI suite

EVENT : C3BI Seminars


Main speaker : Rémy Zallot, from University of Illinois Date : 20-06-2019 at 02:00 pm Location : Duclaux room down groundfloor – DUCLAUX (01), Institut Pasteur, Paris


Protein databases contain an exponentially growing number of sequences as a result of the decrease of cost and difficulty of genome sequencing. The rate of data accumulation far exceeds the rate of functional studies, producing an increase in genomic ‘dark matter’, sequences for which no precise and validated function is defined. Strategies to leverage the protein and genome databases for discovery of the functions of novel enzymes belonging to the dark matter are needed. “Genomic enzymology” is the integration of relationships among sequence-function space in protein families and the genome context of their bacterial, archaeal, and fungal members to propose function. The Enzyme Function Initiative suite of webtools (https://efi.igb.illinois.edu) include the EFI-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks (SSNs) for protein families and the EFI-Genome Neighborhood Tool (EFI-GNT) producing Genome Neighborhood Networks (GNNs) and Genome Neighborhood Diagrams (GND) for analyzing and visualizing genome context of SSNs clusters. Together, these tools facilitate the “Genomic enzymology” application to the ‘dark matter’ problem. A detailed overview of the principle of SSNs, GNNs and GNDs generation will be presented. The identification of an unexpected reaction in the Queuosine biosynthesis pathway will illustrate the approach.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

New computational tools for the analysis of microbiome dynamics

EVENT : C3BI Seminars


Main speaker : Eran Halperin, from UCLA (Computer Science Department & Departments of Human Genetics, Biomathematics & Department of Anesthesiology) Date : 25-06-2019 at 11:00 am Location : Auditorium Jaques Monod – MONOD (66) ,Institut Pasteur, Paris


High-throughput microbiome analysis has become ubiquitous over the last few years. However, the interpretation of the data is often non-trivial and highly depends on the methodology used for the analysis. I will describe a few methods for the analysis of microbiome in contexts that are typical in such analyses. First, I will describe a new method for microbial source tracking, that is, finding the sources of a microbiome sample. I will demonstrate how using the method one can reach very different conclusions (that make more biological sense) than using previous methods. Specifically, I will show examples on the dynamics of gut microbiome in babies, and on the usage of source tracking as a tool for disease diagnosis. Second, I will discuss novel approaches for the analysis of time-series micorbiome data, and here too, I will show how this new approach results in new biological insights, particularly on the dependence of microbiome in the future on the current composition of the microbiome. The talk will be self contained, and I will not assume any expert knowledge in computer science or statistics.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Interpreting the cancer genome through physical and functional models of the cancer cell

EVENT : C3BI Seminars


Main speaker : Trey Ideker, from UC San Diego – School of Medicine Date : 21-06-2019 at 02:00 pm Location : Jules Bordet room – METCHNIKOFF (67) ,Institut Pasteur, Paris


Dr. Ideker is a Professor of Medicine at UC San Diego. He is the Director of the National Resource for Network Biology, the San Diego Center for Systems Biology, and the Cancer Cell Map Initiative. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease.

Recently we and other laboratories have launched the Cancer Cell Map Initiative (ccmi.org) and have been building momentum. The goal of the CCMI is to produce a complete map of the gene and protein wiring diagram of a cancer cell. We and others believe this map, currently missing, will be a critical component of any future system to decode a patient’s cancer genome. I will describe efforts along several lines: 1. Coalition building. We have made notable progress in building a coalition of institutions to generate the data, as well as to develop the computational methodology required to build and use the maps. 2. Development of technology for mapping gene-gene interactions rapidly using the CRISPR system. 3. Causal network maps connecting DNA mutations (somatic and germline, coding and noncoding) to the cancer events they induce downstream. 4. Development of software and database technology to visualize and store cancer cell maps. 5. A machine learning system for integrating the above data to create multi-scale models of cancer cells. In a recent paper by Ma et al., we have shown how a hierarchical map of cell structure can be embedded with a deep neural network, so that the model is able to accurately simulate the effect of mutations in genotype on the cellular phenotype.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Evolution of information in HIV-1 protease

EVENT : C3BI Seminars


Main speaker : Chris Adami, from Michigan State University Date : 06-06-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


Highly-active anti-retroviral therapy has been extremely effective at maintaining low levels of viral load in HIV-infected individuals, but emerging drug resistance is threatening those gains. When therapy is interrupted even briefly, HIV can evolve resistance to one or multiple drugs. Understanding how to stop viral evolution is an important goal of current research. I use HIV-1 protease sequences from public databases to study the dynamics of evolution over a span of nearly ten years, to compare patterns of adaptation in populations that are drug-naive to those that have taken one or multiple protease inhibitors. Using information theory, I show that the amount of information stored in protease sequences of patients that are on drug therapy has been increasing over time, suggesting that they are adapting to the drugs. In comparison, there is no increase in information in the sequences of patients that are drug naive. However, for the virus the increase in information comes at a price: because most of the information is stored in correlations between residues, the sequences are evolving into a more rugged area of the fitness landscape, which could make further evolution more difficult. While the data up to 2006 do not suggest a slowing down of evolution, such a trend may exist in data from later years not analyzed here.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

A population genetic interpretation of complex trait architecture in humans

EVENT : C3BI Seminars


Main speaker : Guy Sella, from Department of Biological Sciences, Columbia University Date : 02-05-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


Human genome-wide association studies (GWASs) are revealing the genetic architecture of anthropomorphic and biomedical traits, i.e., the frequencies and effect sizes of variants that contribute to heritable variation in a trait. To interpret these findings, we need to understand how genetic architectures are shaped by basic population genetic processes—notably, by mutation, natural selection, and genetic drift. Because many complex traits are subject to stabilizing selection and genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multidimensional trait space. We solve the model at steady state, to find that the distribution of variances contributed by loci identified in GWASs should be well approximated by a simple functional form that depends on a single parameter: the expected contribution to genetic variance of a strongly selected site affecting the trait. This prediction fits the findings of GWASs for height and body mass index (BMI) well, allowing us to make inferences about the degree of pleiotropy and mutational target size for these traits. Our findings help to explain why the GWAS for height explains more of the heritable variance than the similarly sized GWAS for BMI and to predict the increase in explained heritability with study size. Considering the demographic history of European populations, in which these GWASs were performed, we further find that most of the associations they identified likely involve mutations that arose during the Out-of-Africa bottleneck at sites with selection coefficients around s = 0.001.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Insights into early human migrations with modern and ancient genomic data

EVENT : C3BI Seminars


Main speaker : Anna-Sapfo Malaspinas, from Department of computational biology, Université de Lausanne
Date : 14-03-2019 at 02:00 pm
Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


Anna-Sapfo Malaspinas is assistant professor in the Department of computational biology of Université de Lausanne since 2017.
Her work aims to characterize evolutionary processes (genetic drift, natural selection, migration and mutation), using genomics data from both modern and ancient samples. Her group develops analytical and computational methods to analyse and interpret time-sampled data and applies those methods to novel ancient DNA datasets. Her work allows quantification and timing of adaptive and migration events, in particular in the context of human colonization of the world.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Using Systems Approaches to Understand the Mechanism of Disease

EVENT : C3BI Seminars


Main speaker : Nevan Krogan, from Quantitative Biosciences Institute , UC San Francisco, USA Date : 11-04-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


There is a wide gap between the generation of large-scale biological data sets and more-detailed, structural and mechanistic studies. However, recent work that explicitly combine data from systems and structural biological approaches is having a profound effect on our ability to predict how mutations and small molecules affect atomic-level mechanisms, disrupt systems-level networks and ultimately lead to changes in organismal fitness. Our group aims to create a stronger bridge between these areas primarily using three types of data: genetic interactions, protein-protein interactions and post-translational modifications.  Protein structural information helps to prioritize and functionally understand these large-scale datasets; conversely global, unbiasedly collected datasets helps inform the more mechanistic studies. Our efforts in this respect have been focused on three disease areas: cancer, infectious diseases and neuropsychiatric disorders. Our work has found remarkable similarities between these and other disease areas which are leading to novel therapeutic strategies.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Human gut resistome

EVENT : C3BI Seminars


Main speaker : Amine Ghozlane, from HUB, C3BI Pasteur Date : 04-04-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


The intestinal microbiota is considered to be a major reservoir of antibiotic resistance determinants (ARDs) that could potentially be transferred to bacterial pathogens via mobile genetic elements. Yet, this assumption is poorly supported by empirical evidence due to the distant homologies between known ARDs (mostly from culturable bacteria) and ARDs from the intestinal microbiota. Consequently, an accurate census of intestinal ARDs (that is, the intestinal resistome) has not yet been fully determined. For this purpose, we developed and validated an annotation method (called pairwise comparative modelling) on the basis of a three-dimensional structure (homology comparative modelling), leading to the prediction of 6,095 ARDs in a catalogue of 3.9 million proteins from the human intestinal microbiota. We found that the majority of predicted ARDs (pdARDs) were distantly related to known ARDs (mean amino acid identity 29.8%) and found little evidence supporting their transfer between species. According to the composition of their resistome, we were able to cluster subjects from the MetaHIT cohort (n = 663) into six resistotypes that were connected to the previously described enterotypes. Finally, we found that the relative abundance of pdARDs was positively associated with gene richness, but not when subjects were exposed to antibiotics. Altogether, our results indicate that the majority of intestinal microbiota ARDs can be considered intrinsic to the dominant commensal microbiota and that these genes are rarely shared with bacterial pathogens.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Bayesian matrix factorization for drug discovery and precision medicine

EVENT : C3BI Seminars


Main speaker : Yves Moreau, from Center for Computational Systems Biology, KU Leuven Date : 31-01-2019 at 02:00 pm Location : Auditorium Francois Jacob – BIME (26) ,Institut Pasteur, Paris


Matrix factorization/completion methods provide an attractive framework to handle sparsely observed data, also called “scarce” data. A typical setting for scarce data are is clinical diagnosis in a real-world setting. Not all possible symptoms (phenotype/biomarker/etc.) will have been checked for every patient. Deciding which symptom to check based on the already available information is at the heart of the diagnostic process. If genetic information about the patient is also available, it can serve as side information (covariates) to predict symptoms (phenotypes) for this patient. While a classification/regression setting is appropriate for this problem, it will typically ignore the dependencies between different tasks (i.e., symptoms). We have recently focused on a problem sharing many similarities with the diagnostic task: the prediction of biological activity of chemical compounds against drug targets, where only 0.1% to 1% of all compound-target pairs are measured. Matrix factorization searches for latent representations of compounds and targets that allow an optimal reconstruction of the observed measurements. These methods can be further combined with linear regression models to create multitask prediction models. In our case, fingerprints of chemical compounds are used as “side information” to predict target activity. By contrast with classical Quantitative Structure-Activity Relationship (QSAR) models, matrix factorization with side information naturally accommodates the multitask character of compound-target activity prediction. This methodology can be further extended to a fully Bayesian setting to handle uncertainty optimally, and our reformulation allows scaling up this MCMC scheme to millions of compounds, thousands of targets, and tens of millions of measurements, as demonstrated on a large industrial data set from a pharmaceutical company. We also show applications of this methodology to the prioritization of candidate disease genes and to the modeling of longitudinal patient trajectories. We have implemented our method as an open source Python/C++ library, called Macau, which can be applied to many modeling tasks, well beyond our original pharmaceutical setting. https://github.com/jaak-s/macau/tree/master/python/macau.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Linking gene and function, comparative genomics tools for biologists

EVENT : C3BI Training


Main speaker : Valerie de Crecy-Lagard, from University of Florida · Department of Microbiology and Cell Science
Date : 17-06-2019 (08:00am) – 21-06-19
Location : Yersin Training room (24) ,Institut Pasteur, Paris

Students will need to bring their laptop.


More than twenty years after the first bacterial genome has been sequenced, microbiologists are faced with an avalanche of genomic data. However, the quality of the functional annotations of the sequenced proteome is very poor with more than half of the sequenced proteins remaining of unknown function.

With nearly 80,000 whole genomes sequences available and increasing amount of post-genomics experimental data available, it is possible to gather different types of information that lead to better functional annotations and can guide the experimental process. The workshop will guide the attendees through practical examples and show them an array of tools and databases that they can apply directly to their research problem.

No prior programming experience is required, all the tools available can be used through graphic user interfaces.

For background read (https://www.ncbi.nlm.nih.gov/pubmed/20001958)