News – IFB Projects – Developement of innovative bioinformatics services for life sciences

Thirty nine projects have been submitted to IFB, sixteen have been accepted. Five of those accepted projects imply the C3BI.

Enhancing the CRISPRdb database and related services: CRISPR-Cas++

Leader : D. Gautheret (psud) Partners :

  • eBio, I2BC, Orsay (1 group + PF) 5 pers
  • Institut Pasteur, Paris (2 groups + PF)
  • Bilille bioinformatics platform, Université Lille 1 pers

Description :

CRISPRs are genetic loci present in bacteria and archaea which, associated to cas genes, provide defense against foreign sequences. The CRISPR-Cas system is a highly successful biotechnology tool and CRISPR sequences are used in genotyping pathogenic bacteria. The CRISPR database and services developed at I2BC are leading international resources for CRISPR sequence analysis. These resources now need to be strengthened and new tools and services should be added to meet growing user demands. This proposal is a collaboration between a senior member of the initial CRISPRdb team, two labs with strong expertise in cas gene sequence analysis and bacterial genotyping, and three bioinformatics platforms providing engineering support. The project is divided in two parts. First we will improve the existing CRISPR tools in terms of database structure, search engines and interfaces and develop a standalone version of the CRISPR finding tool to support large scale analyzes. Second, we will incorporate new features: a Cas sequence database and analysis tool and a bacterial genotyping tool. Involving key players such as Institut Pasteur and the French Bioinformatics Institute (that will host the future web site) should provide a strong foundation to this new CRISPR-Cas resource and promotes its international status.

Duration : 2 years

ARIA for hybrid structure determination on the cloud

Leader : B. Bardiaux (bis, pasteur) Partners:

  • Pasteur (6 pers)

Description:

Protein structure determination is crucial for understanding protein function, as it paves the way to the discovery of new drugs and of new approaches to control pathological biological processes. The recent advances in structural biology now allow collecting structural information from a variety of techniques at various resolutions. Integration of such heterogeneous data to determine hybrid structure is currently a computational challenge in molecular modelling, both in term of computing efficiency and availability of bioinformatics tools. The widely used ARIA software developed at Institut Pasteur [258 citations, Rieping W et al.(2007)] has proven very efficient in automatically determining protein structures from NMR data. In this project, we will expand the repertoire of input data types that can be used with ARIA for hybrid structure determination. In parallel, it is necessary to bridge the data analysis modules of ARIA with other relevant structure generation engines to be able to analyse the data types. To ultimately provide a perfectly transparent service to the end user, we will design a web-interface for ARIA and make hybridARIA freely available for the scientific community, notably through the cloud deployed by IFB.

Duration : 2 ans

NGPhylogenie.fr

Leader : S. Cohen-boulakia (lri- psud) Partners :

  • IGS (3 pers)
  • LRI (1 pers)
  • Institut Pasteur (4 pers)
  • LIRMM (2pers)

Description :

With 50,000 data analysis per month and more than 1,500 citations (google scholar), the phylogenetic analysis pipeline Phylogeny.fr [1898 citations, Dereeper A et al.(2008)] is one of the most visible French IT resources both at the national and international levels. Phylogenetic analysis is performed by chaining (selected) programs together. Today, users’ needs have evolved; they can use Phylogeny.fr for teaching, inducing possibly hundreds of users at the same time, or employ it in batch mode leading to the submission of large amount of requests to the same server. Those practises have led to several engorgements of our servers. In this project, we thus plan to increase the robustness of Phylogeny.fr. The originality of the new version of Phylogeny.fr lies in considering a scientific workflow environment (Galaxy) coupled with a web interface allowing visualization and interaction with phylogenetic objects. More precisely, this project will provide (i) a large set of phylogenetic analysis bricks and for each brick, access to diverse programs, all encapsulated into Galaxy thus making the system able to deal with large groups of users and/or large sets of data, (ii) a set of optimized, robust and expressive workflows extending the basic phylogenetic workflow to various and rich contexts of phylogenetic analyses, (iii) an easy-to-install environment equipped with a new visualization layer, on top of the Galaxy system, and dedicated to phylogenetic analyse

Duration : 2 years

Provide ready to use Galaxy analyses environment for LiFe Sciences communities

Leader : O. Inizan (URGI) Partners:

  • INRA-URGI (1 pers)
  • CNRS-ABiMS (1 pers)
  • Institut Curie (1pers)
  • IRD-soutgreen (1 pers)
  • INRA-Genotoul (1 pers)
  • INRA-Migale (1pers)
  • INRA-PFEM (1pers)
  • INIRIA-Genouest (1 pers)
  • Institut Pasteur (1pers)

Description :

With virtualization technologies the way we consider accessibility and reproducibility (A/R) in computing science has shifted. From the classical approach where A/R was possible through bioinformatics tools distribution, we are now ready to use appliances available on marketplaces hosted in a cloud. Such appliances represent an important shift because not only tools are accessible for reproducibility, but also all the components contributing to the environment of analysis. As virtualization and cloud computing technologies will expand, we expect that the ability to build such containers and to deploy them on heterogeneous infrastructures (desktop, cloud, medium infrastructure hosted in a lab) will become a major topic for the activity consisting in providing services to scientists. On the other hand, the galaxy platform meets a great success in several scientific communities and becomes an important layer of environments dedicated to biological analysis. In this project, we plan to provide ready-to-use Galaxy-environments for analysis to several scientific communities. This project will be organized around two axes. In the first axe, partners from several scientific communities will design representative use cases. In the second axe, use cases will be implemented as containers and made accessible on the IFB cloud infrastructure. We expect that the technical solutions and expertise developed during the project will be re-usable and useful for wider scientific communities, specially for the european life science community: ELIXIR.

Duration : 2 years

Microcloud

Leader : D. Vallenet (genoscope) Partners:

  • LABGeM: (4 pers)
  • Institut Pasteur: (1pers)

Description :

MicroScope [269 citations, Vallenet D. et al.(2006)] is an integrated platform to support microbial genomes (re)annotation and comparative analysis. The current project aims at designing a version of the MicroScope platform using Cloud technologies to progressively switch into a Software as a Service (SaaS) distribution mode. This technical evolution will require several adaptations of the current architecture in: (i) the integration of the MicroScope components in a single appliance (ii) the adaptation of workflows for dynamic provisioning of cluster workers (iii) the setup of a service providing and handling the update of the required reference databanks for the different MicroScope instances. Additional functionalities will also be developed: (i) user interfaces for data and workflow management (ii) a central repository of MicroScope genomes to allow users to share their data within the community of microbiologist. The main purpose of the project is to provide biologists with an on-demand MicroScope solution without any requirement of specific computational skills and with minimal user support. Furthermore, it should increase the flexibility in scale and cost for the needs of computation and storage to face the challenge of Big Data in genomics. These technological developments will be made in collaboration between the CEA/Genoscope (LABGeM), the Pasteur Institute (C3BI/CIB) and the Institut Français de Bioinformatique (IFB), and could be the starting point of a new ELIXIR pilot project (www.elixir.eu) in the domain of microbial genomics.

Duration : 2 ans