News – IFB Projects – Developement of innovative bioinformatics services for life sciences

Thirty nine projects have been submitted to IFB, sixteen have been accepted. Five of those accepted projects imply the C3BI.

Enhancing the CRISPRdb database and related services: CRISPR-Cas++

Leader : D. Gautheret (psud) Partners :

  • eBio, I2BC, Orsay (1 group + PF) 5 pers
  • Institut Pasteur, Paris (2 groups + PF)
  • Bilille bioinformatics platform, Université Lille 1 pers

Description :

CRISPRs are genetic loci present in bacteria and archaea which, associated to cas genes, provide defense against foreign sequences. The CRISPR-Cas system is a highly successful biotechnology tool and CRISPR sequences are used in genotyping pathogenic bacteria. The CRISPR database and services developed at I2BC are leading international resources for CRISPR sequence analysis. These resources now need to be strengthened and new tools and services should be added to meet growing user demands. This proposal is a collaboration between a senior member of the initial CRISPRdb team, two labs with strong expertise in cas gene sequence analysis and bacterial genotyping, and three bioinformatics platforms providing engineering support. The project is divided in two parts. First we will improve the existing CRISPR tools in terms of database structure, search engines and interfaces and develop a standalone version of the CRISPR finding tool to support large scale analyzes. Second, we will incorporate new features: a Cas sequence database and analysis tool and a bacterial genotyping tool. Involving key players such as Institut Pasteur and the French Bioinformatics Institute (that will host the future web site) should provide a strong foundation to this new CRISPR-Cas resource and promotes its international status.

Duration : 2 years

ARIA for hybrid structure determination on the cloud

Leader : B. Bardiaux (bis, pasteur) Partners:

  • Pasteur (6 pers)

Description:

Protein structure determination is crucial for understanding protein function, as it paves the way to the discovery of new drugs and of new approaches to control pathological biological processes. The recent advances in structural biology now allow collecting structural information from a variety of techniques at various resolutions. Integration of such heterogeneous data to determine hybrid structure is currently a computational challenge in molecular modelling, both in term of computing efficiency and availability of bioinformatics tools. The widely used ARIA software developed at Institut Pasteur [258 citations, Rieping W et al.(2007)] has proven very efficient in automatically determining protein structures from NMR data. In this project, we will expand the repertoire of input data types that can be used with ARIA for hybrid structure determination. In parallel, it is necessary to bridge the data analysis modules of ARIA with other relevant structure generation engines to be able to analyse the data types. To ultimately provide a perfectly transparent service to the end user, we will design a web-interface for ARIA and make hybridARIA freely available for the scientific community, notably through the cloud deployed by IFB.

Duration : 2 ans

NGPhylogenie.fr

Leader : S. Cohen-boulakia (lri- psud) Partners :

  • IGS (3 pers)
  • LRI (1 pers)
  • Institut Pasteur (4 pers)
  • LIRMM (2pers)

Description :

With 50,000 data analysis per month and more than 1,500 citations (google scholar), the phylogenetic analysis pipeline Phylogeny.fr [1898 citations, Dereeper A et al.(2008)] is one of the most visible French IT resources both at the national and international levels. Phylogenetic analysis is performed by chaining (selected) programs together. Today, users’ needs have evolved; they can use Phylogeny.fr for teaching, inducing possibly hundreds of users at the same time, or employ it in batch mode leading to the submission of large amount of requests to the same server. Those practises have led to several engorgements of our servers. In this project, we thus plan to increase the robustness of Phylogeny.fr. The originality of the new version of Phylogeny.fr lies in considering a scientific workflow environment (Galaxy) coupled with a web interface allowing visualization and interaction with phylogenetic objects. More precisely, this project will provide (i) a large set of phylogenetic analysis bricks and for each brick, access to diverse programs, all encapsulated into Galaxy thus making the system able to deal with large groups of users and/or large sets of data, (ii) a set of optimized, robust and expressive workflows extending the basic phylogenetic workflow to various and rich contexts of phylogenetic analyses, (iii) an easy-to-install environment equipped with a new visualization layer, on top of the Galaxy system, and dedicated to phylogenetic analyse

Duration : 2 years

Provide ready to use Galaxy analyses environment for LiFe Sciences communities

Leader : O. Inizan (URGI) Partners:

  • INRA-URGI (1 pers)
  • CNRS-ABiMS (1 pers)
  • Institut Curie (1pers)
  • IRD-soutgreen (1 pers)
  • INRA-Genotoul (1 pers)
  • INRA-Migale (1pers)
  • INRA-PFEM (1pers)
  • INIRIA-Genouest (1 pers)
  • Institut Pasteur (1pers)

Description :

With virtualization technologies the way we consider accessibility and reproducibility (A/R) in computing science has shifted. From the classical approach where A/R was possible through bioinformatics tools distribution, we are now ready to use appliances available on marketplaces hosted in a cloud. Such appliances represent an important shift because not only tools are accessible for reproducibility, but also all the components contributing to the environment of analysis. As virtualization and cloud computing technologies will expand, we expect that the ability to build such containers and to deploy them on heterogeneous infrastructures (desktop, cloud, medium infrastructure hosted in a lab) will become a major topic for the activity consisting in providing services to scientists. On the other hand, the galaxy platform meets a great success in several scientific communities and becomes an important layer of environments dedicated to biological analysis. In this project, we plan to provide ready-to-use Galaxy-environments for analysis to several scientific communities. This project will be organized around two axes. In the first axe, partners from several scientific communities will design representative use cases. In the second axe, use cases will be implemented as containers and made accessible on the IFB cloud infrastructure. We expect that the technical solutions and expertise developed during the project will be re-usable and useful for wider scientific communities, specially for the european life science community: ELIXIR.

Duration : 2 years

Microcloud

Leader : D. Vallenet (genoscope) Partners:

  • LABGeM: (4 pers)
  • Institut Pasteur: (1pers)

Description :

MicroScope [269 citations, Vallenet D. et al.(2006)] is an integrated platform to support microbial genomes (re)annotation and comparative analysis. The current project aims at designing a version of the MicroScope platform using Cloud technologies to progressively switch into a Software as a Service (SaaS) distribution mode. This technical evolution will require several adaptations of the current architecture in: (i) the integration of the MicroScope components in a single appliance (ii) the adaptation of workflows for dynamic provisioning of cluster workers (iii) the setup of a service providing and handling the update of the required reference databanks for the different MicroScope instances. Additional functionalities will also be developed: (i) user interfaces for data and workflow management (ii) a central repository of MicroScope genomes to allow users to share their data within the community of microbiologist. The main purpose of the project is to provide biologists with an on-demand MicroScope solution without any requirement of specific computational skills and with minimal user support. Furthermore, it should increase the flexibility in scale and cost for the needs of computation and storage to face the challenge of Big Data in genomics. These technological developments will be made in collaboration between the CEA/Genoscope (LABGeM), the Pasteur Institute (C3BI/CIB) and the Institut Français de Bioinformatique (IFB), and could be the starting point of a new ELIXIR pilot project (www.elixir.eu) in the domain of microbial genomics.

Duration : 2 ans

Training – Session d’hiver : programmation et scripting – 11 February 2016

EVENT : C3BI Training – Programmation & Scripting

Ecole Du C3BI : Session d’hiver : programmation et scripting


From : 11/02/2015

To : 25/02/2015


Objectif de la formation : Ce cours s’adresse à toute personne du campus souhaitant acquérir des bases de la programmation et du scripting utiles à la bioinformatique et ayant du mal à trouver du temps pour se former tout le long de l’année.
Pré-requis :Aptitude à travailler onze jours en continu en informatique et avoir des besoins confirmés par des projets en cours ou à venir.

Cette formation n’est accessible qu’aux personnes de l’Institut Pasteur, retrouvez les modalités d’inscription sur le site intranet de Pasteur.

Seminars – Human population genetics – 4 February 2016

EVENT : C3BI Seminars

 Human population genetics: genetic adaptation and epigenetic responses to environmental change


Speaker : Lluis Quintana-Murci, from Human Evolutionary Genetics Unit – Institut Pasteur Time : 02:00 pm Starting Date : 04/02/2016     

Location : Retrovirus room – LWOFF (22), Institut Pasteur, Paris


Human population genetics: genetic adaptation and epigenetic responses to environmental change

Different environmental, demographic and selective forces, together with cultural and social characteristics of human lifestyle, shape the patterns of variability of the human genome at the population level. In particular, infectious diseases have been a major cause of human mortality, so natural selection is expected to act strongly on host defence genes. This is particularly expected for innate immunity genes, as they represent the first line of host defence against pathogens. I will present different cases of how some of these genes and the pathways they trigger have been targeted by natural selection, in its different forms and intensities, helping to delineate genes that are important for host defence, with respect to those exhibiting higher immunological redundancy. I will also discuss how population-specific genetic variation can profoundly impact immune-related molecular phenotypes, such mRNA and miRNA expression upon infection (expression quantitative trait loci – eQTL – mapping), and how these studies increase our understanding of immunological mechanisms under genetic control that have been crucial for our past and present survival against infection. Finally, I will discuss how the differences in lifestyle and habitat of human populations, together with their distinct patterns of genetic diversity, affect the epigenetic landscape of the human genome. Specifically, our studies of populations of African rainforest hunter-gatherers and sedentary farmers show that methylation variation associated with recent changes in habitat mostly involves immune functions, whereas that associated with historical lifestyle primarily affects developmental processes. Our work increases our understanding of whether and how populations are able to respond/adapt to environmental changes, including those related to pathogen pressures, over different time scales.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Seminars – P-Metagenomic Analysis Group – Paris – 29 January 2016

EVENT : C3BI Seminars – Large audience

P-MAG – Paris – Metagenomic Analysis Group


Speakers : Stevenn Volant research engineer, from Institut Pasteur, Amine Ghozlane researche engineer, from Institut Pasteur, Etienne Ruppé researcher from Hôpitaux Universitaire de Genève and Eric Pelletier researcher from Genoscope/CEA

    Time : 02:00 pm till 05:00 pm

      Starting Date : 29/01/2016

Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris


P-MAG_pasteur  

Registrations are now closed.

Seminars – Conservation and co-evolution – 21 January 2016

EVENT : C3BI Seminars

Conservation and co-evolution: from sequence analysis to protein-protein interactions


Speaker : Alessandra Carbone, from  Laboratory of Computational and Quantitative Biology, CNRS Université Pierre et Marie Curie, Paris

     Time : 02:00 pm     

Starting Date : 21/01/2016     

Location : Retrovirus room – LWOFF (22), Institut Pasteur, Paris


Conservation and co-evolution: from sequence analysis to protein-protein interactions

In computational biology, a fundamental question is the extraction of evolutionary information from DNA sequences. Here, we consider protein sequences and structures. Given a family of protein sequences and the associated distance tree, we shall explain how a fine reading of the conservation and co-evolution signals between residues in sequences can be used to identify protein binding sites, mechanical and allosteric properties, protein-protein interactions. Based on this novel approach to coevolution analysis, we reconstructed the protein-protein interaction network of the Hepatitis C Virus at the residue resolution. For the first time, coevolution analysis of an entire virus was realised, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used for interaction predictions for other viral protein interaction networks.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Seminars – Quality control of the transcription by (NMD) – 7 January 2016

Thursday January 7th, at 2PM @waqueteu will talk about QC of the transcription by NMD revealed by TSS-RNAseq analysis

EVENT : C3BI Seminars – Methodological

Quality control of the transcription by Nonsense-Mediated-mRNA Decay (NMD) revealed by TSS-RNAseq analysis


Speaker : Christophe Malabat,  Research Engineer from Genetics of Macromolecular Interactions Team       Time : 02:00 pm      Starting Date : 07/01/2016     

Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris


Nonsense-mediated mRNA decay (NMD) is a translation-dependent RNA quality-control pathway targeting transcripts such as messenger RNAs harbouring premature stop-codons or short upstream open reading frame (uORFs). Our transcription start sites (TSSs) analysis of Saccharomyces cerevisiae cells deficient for RNA degradation pathways revealed that about half of the pervasive transcripts are degraded by NMD, which provides a fail-safe mechanism to remove spurious transcripts that escaped degradation in the nucleus. Moreover, we found that the low specificity of RNA polymerase II TSSs selection generates, for 47% of the expressed genes, NMD-sensitive transcript isoforms carrying uORFs or starting downstream of the ATG START codon. Despite the low abundance of this last category of isoforms, their presence seems to constrain genomic sequences, as suggested by the significant bias against in-frame ATGs specifically found at the beginning of the corresponding genes and reflected by a depletion of methionines in the N-terminus of the encoded proteins.


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Trainings – Leishield Training Course on Next Generation Sequencing

EVENT : C3BI Training – NGS Data Analysis

Leishield – Training Course on Next Generation Sequencing


Organizing Committee & Module coordinators : Fatma Z. Guerfali, Antonio V. Borderia,  Marie-Agnès Dillies, Christophe Malabat, from Institut Pasteur Tunis & C3BI

Date : from 30/11/2015 to 04/12/2015

Location : Pacific room – CIS (22), Institut Pasteur, Paris


The primary aim of this course was to provide a basic understanding of the Leishmania genome, NGS technology and analysis tools, and to develop a basic pipeline for the students to start working on their data. This first pipeline will help to standardize all analysis done in the consortium and should facilitate a posterior paper publication. This pipeline will evolve in the context of a collaboration between the Leishield partners and the C3BI, taking into account the difficulties to analyze and interpret the sequence data generated, and the specific needs of each Consortium node. A future Workshop will be organized in June 2016 to tackle all these questions, and to have a follow-up on the analysis.

CoursLeishield

The secondary aim of this course was to have a first contact between the C3BI and Leishield in order to establish a future collaboration. This course provided to the teachers and students, fertile ground to work together and exchange ideas.

 
LeiSHield_NGS_WORKSHOP

Seminars – Network biology and Salmonella infection mechanisms

EVENT : C3BI Seminars – Methodological

Network biology approaches to uncover the mechanisms of Salmonella infection and its autophagy modulating features in the gut


Speaker : Tamas Korcsmaros, Research Leader from TGAC, The Genome Analysis Centre, Norwich Research Park, Norwich, UK – Institute of Food Research, Norwich Research Park, Norwich, UK      Time : 02:00 pm      Starting Date : 07/12/2015     

Location : Retrovirus room – LWOFF (22), Institut Pasteur, Paris


In the last decade, networks became a novel approach in understanding how changes in cellular processes can lead to diseases, such as cancer, infection and inflammatory bowel disease (IBD). In our studies we focus on autophagy (cellular self-degradation) and its regulation. Autophagy is a stress response mechanism also important in development, immune regulation, ageing and cancer, where it could act as both pro- and anti-tumorigenic. Autophagy malfunction is also known to be related to IBD, and it is often manipulated by intestinal pathogenic bacteria, such as Salmonella.

To investigate how Salmonella is modulating autophagy we developed the first large-scale network resource for Salmonella enterica, integrating known and predicted regulatory, metabolic and signalling interactions. We investigated the variation and commonality in the networks of Salmonella serovars with either a predominantly intestinal or extra-intestinal pathogenicity. We analysed the differences (e.g., regulatory connections) for 10 strains of two niche groups (intestinal / extraintestinal), as well as defined a “core” Salmonella consensus network. Then, built on the combination of earlier identified Salmonella-host interactions and our recently published Autophagy Regulatory Network resource (http://autophagy-regulation.org), we predicted novel genes responsible for autophagy modulation in the gut using the intestine specific Salmonella networks. Finally, we have designed a fluorescence reporter system to monitor autophagy of Salmonella and to validate the role of predicted genes. The developed bioinformatics workflows and experimental validation system could be used for other strains or pathogens to iteratively predict genes for biological validation with the potential to provide additional insight or model refinement.

Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting

Available Positions – C3BI 2016 – Job Application

Performance evaluation of DNA copy number segmentation methods

Thursday 3rd of December, Pierre Neuvial will talk about performance evaluation of DNA copy number segmentation methods. #C3BIPasteur

EVENT : C3BI Seminars – Large audience

Performance evaluation of DNA copy number segmentation methods


Speaker : Pierre Neuvial, CNRS researcher from Laboratoire de Mathématiques et Modélisation d’Evry (LaMME) – Équipe Statistique & Génome – Université d’Evry Val d’Essonne      Time : 02:00 pm      Starting Date : 03/12/2015     

Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris


A number of bioinformatic or biostatistical methods are available for segmenting DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses.

In order to make an objective and reproducible performance assessment, we have designed and implemented a resampling-based framework to generate realistic DNA copy number profiles of cancer samples with known truth. In this talk, I will describe this framework and its application to a comparison study between methods for segmenting DNA copy number profiles from SNP microarrays.

This study indicates that no single method is uniformly better than all others. It also helps identifying pros and cons of the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers.

Reference: M. Pierre-Jean, G. Rigaill and P. Neuvial Performance evaluation of DNA copy number segmentation methods. Briefings in Bioinformatics (2015) http://bib.oxfordjournals.org/content/16/4/600


Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting