Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Non applicable
Related people (6)
After a Master degree in Genome Analysis and Molecular Modeling at Denis Diderot University, I did a PhD in NMR / bioinformatics at Denis Diderot University, where I worked on the development and use of a software named DaDiModO which uses SAXS data and RDC/NMR data to calculate models of structural proteins. After a postdoc aiming to adapt ARIA software to allow execution on computing grid in the Structural Bioinformatic Team at Institut Pasteur in collaboration with IBCP, I joined CIB/DSI Team where I was responsible for the development of bioinformatics projects and the deployment, maintenance and evolution of the Pasteur Galaxy server. I joined the Hub/C3BI team in 2017 as research engineer where I’m involved in several projects such as structural bioinformatics, softwares and web development. I am also in charge of the maintenance of the Galaxy Pasteur instance.
Data managementGalaxyStructural bioinformaticsWeb developmentDatabaseProgram developmentScientific computingDatabases and ontologiesWorkflow and pipeline developmentGrid and cloud computing
- Development of a secure API for ARIAweb(Benjamin BARDIAUX - Structural Bioinformatics) - Pending
- Development of a web server to calculate functional binding sites using Deep Learning(Olivier SPERANDIO - Structural Bioinformatics) - Pending
- A pipeline to detect correlated evolution on phylogenetic trees(Eduardo ROCHA - Microbial Evolutionary Genomics) - In Progress
Activities Contact for any subject related to IFB. Help scientists to develop new tools (architecture, design, implementation). animate the Python Working Group at pasteur . O|B|F (http://www.open-bio.org/) member. Skills Strong programming experience in Python. Software architecture and design. NoSQL DataBase (MongoDB, CouchDB) XML/YAML continuous integration (github/travis-CI/readthedocs, gitlab/gitlab-CI) containers (Docker, Singularity) linux (Gentoo, Xubuntu) IFB developer Main projects on the campus Mobyle http://Mobyle.pasteur.fr Mobyle: a new full web bioinformatics framework IntegronFinder (ongoing project) MacsyFinder (ongoing project) githubaccess to my projects on github Teaching Unix (Unix-I , Unix-II) Python . Education 2002 Phd in Molecular and cellular biology. “Rôle de deux protéines QN1 et PATF impliquées dans l’arrêt de prolifération des cellules de la neurorétine aviaire au cours du developpement”. 2001 “Informatique En Biologie” course (Pasteur)
Data managementDatabaseProgram developmentScientific computingDatabases and ontologies
Related projects (29)
The CRISPR/Cas9 technology is a recent breakthrough in rapid genetic editing. A major part of getting the technology to work is the proper design of a guide RNA that will help Cas9 target specific genomic sequences. The design of this guide RNA must take into account all possible matches along an organism’s genome with as little as 50% similarity. Such a high toleration for error means that current alignment algorithms are not well suited to the task. This issue leads to suboptimal guide RNA design and/or lengthy periods of the design process. It is a problem that is exacerbated when considering CRISPR/Cas9 for high throughput applications. The development of a new brand of sequence comparison algorithms is required.
Nous recherchons un logiciel basé ou pas sur spade qui tourne sous R qui permet de faire une analyse des fichiers de cytométrie de type FCS par clustering multiparamétrique. nous serions également intéressé par visne également. les fichiers peuvent être transformé en CSV Il existe un logiciel commercial Cytobank mais qui ne rempli pas toute les fonctions souhaité.
Mapping of research themes and fields of expertise available in the Institut Pasteur international Network
Using data extracted from Pubmed, we would like to develop a tool for systematic analysis of research themes and fields of expertise available in the Institut Pasteur International Network (IPIN). The tool would be available to the Pasteur community and could be questioned using search terms in Pubmed, identifying articles involving research teams from IPIN and displaying the name of authors, research units, and location in a visual format. We hope this tool would enable researchers to identify colleagues for sharing expertise and developing collaborations.
When dealing with high depth read data, a simple way to associate accurate analyses to moderate computational resources is to extract a subset of raw reads that allows observing both homogeneous and moderate coverage depth. Unfortunately, current implementations are often unexpectedly slow and require many significant pre-processing of large files to be used in practice. In the current scientific context, much effort must go into algorithm design and efficient programming to process large data with reasonable running times. An efficient implementation should therefore be developed in order to quickly perform read coverage homogenization. Indeed, such tool will help to deal with highly redundant sequencing data by creating read subsets with useful properties. As the read coverage homogenization step is expected to be systematically used for pre-processing the large amount of raw reads generated in the PIBnet context, a development carried out by members of the CIB platform is expected to lead to efficient solutions that will take advantage of the computing resources hosted by the Institut Pasteur.
Improvement of two existing tools, COV2HTML (published in 2014) and SEQ2HTML (private), and transform two scripts into interactive web interfaces addressed to biologists.
Hydrogen deuterium exchange detected by mass spectrometry (HDX-MS) is a powerful technique to probe the conformation and dynamics of proteins. Over the past 10 years, the HDX-MS workflow has been optimized and automatized leading to a rapid expansion of the technology in both academic lab and pharmaceutical companies. Thanks to these improvements, modern HDX-MS can be applied to investigate more complex biological systems, including large protein complexes and membrane proteins. However, the higher the size of the protein under study, the more complex the HDX-MS data. Several noncommercial and commercial software solutions have been developed to help for the analysis of HDX-MS data. We are currently using DynamX 3.0 that is a Waters-specific product specifically designed for the nanoACQUITY UPLC system with HDX technology. The aim of the project is to design and implement a statistical tool compatible with the output generated by DynamX to read ily validate results obtained with large protein complexes.
DNA topoisomerase IB (Topo IB) enzymes are ubiquitous in eukaryotes, where they represent the major DNA topoisomerase I activity. However, Topo IB sequences are also found in other phyla, such as archaea and bacteria, as well as viruses. Given the large amount of sequenced data available in public databases, this project aims to infer a robust Topo IB gene tree based on a representative set of homologous sequences gathered from a large taxonomic sample.
Mise a disposition d'un(e) bioinformaticien(ne) du hub pour les analyses bioinformatiques du transcriptome et de l epigenome
La PF Transcriptome et Epigenome développe des projets de séquençage à haut débit (collaboration et service) avec des équipes du Campus. Ceux-ci couvrent l'ensemble des thématiques du campus ainsi qu'une large gamme d'organismes (des virus aux mammifères). La plate-forme exerce des activités de biologie humide (construction des librairies et séquençage) et de biologie sèche (analyse bioinformatiques et statistiques). La personne mise a disposition interagira étroitement avec les autres bioinformaticiens du pôle BioMics et du Hub. Ses activités concerneront notamment: - La participation à la conception et à la mise en place des projets avec les équipes demandeuses, la prise en charge des analyses et le reporting aux utilisateurs - La mise en place d'un workflow d'analyse bioinformatique des données de transcriptome /épigénome en étroite collaboration avec le C3BI, la DSI et les autres bioinformaticiens du pole. Ce workflow permettra le contrôle qualité des données, leur prétraitement, le mapping des séquences sur les génomes/transcriptomes de réference, et le comptage des reads pour les différents éléments de l'annotation - L'adaptation du workflow d'analyse aux questions biologiques et aux organismes étudiés dans le cadre des activités de la PF - L'activité de veille technologique et bibliographique (test et validation de nouveaux outils d'analyse, updates d'outils existants...) - La mise en place et le développement d'outils d'analyse adaptés aux futurs projets de la PF: single cell RNAseq, métatranscriptome, ChIPseq, analyse des isoformes de splicing.. Ceci se fera notamment via la réalisation d'analyses dédiées avec certains utilisateurs. Les outils mis en place et validés dans ce cadre seront ensuite utilisés pour l'ensemble des projets. - L'activité de communication et de formation (participation aux réunions du consortium France Génomique,formation permanente à l' Institut Pasteur… - la participation a d autres projets du Pole BioMics (selon disponibilité) Bernd Jagla, qui était le bioinformaticien de la plateforme a rejoint le Hub au 1er janvier 2016. Rachel Legendre est mise a disposition depuis le 2 novembre 2015 et remplace Bernd Jagla. Je souhaite que Rachel Legendre soit mise à disposition de la plateforme pour une durée d'au moins 2 ans.
Development and use of statistical programs to analyze RNA-Seq data produced at the Transcriptome & Epigenome Platform
The Transcriptome & Epigenome Platform is dedicated to the development and use of high throughput approaches for transcriptomics and epigenomics studies. The platform is accessible to any research team from the Pasteur Institute (80% of the projects) as well as from outside. It is involved (most often as collaborator) in several projects funded by the ANR, AVIESAN and by the Pasteur Institute in the framework of the PTR programs. Next Generation Sequencing (NGS) based on the Illumina technology (HiSeq 2000/2500 sequencers) is used to perform RNA-sequencing experiments for which a large amount of data is generated. After a first step involving bioinformatics, specific statistical methods must be used be analyze rigorously the data. These analyses are most often performed by the statistician(s) of the platform. They are also in charge of bibliographical survey activity.
Development and use of statistical programs to analyze RNA-Seq data produced at the Transcriptome & Epigenome Platform
The Transcriptome & Epigenome Platform is dedicated to the development and use of high throughput approaches for transcriptomics and epigenomics studies. The platform is accessible to any research team from the Pasteur Institute (80% of the projects) as well as from outside. It is involved (most often as collaborator) in several projects funded by the ANR, Microbes and Brain, ERANET and by the Pasteur Institute in the framework of the PTR programs. Next Generation Sequencing (NGS) based on the Illumina technology (HiSeq 2000/2500 sequencers) is used to perform RNA-sequencing experiments for which a large amount of data is generated. After a first step involving bioinformatics, specific statistical methods must be used be analyze rigorously the data. These analyses are most often performed by the statistician(s) of the platform. They are also in charge of bibliographical survey activity.
The Transcriptome and EpiGenome platform has a strong expertise in the bioinformatical and statistical analysis of RNA seq data. Nevertheless, we have more and more demands for the use of NGS to characterize the epigenome (using ChIPseq approach) or chromatine accessibility (by ATAC-seq) .We thus need to further develop and validate analysis workflows for these types of data. This project aims at developing and formalizing collaboration between the platform and some experts in this field at the hub. This would include: joint project kic-off meetings, development and validation of ChIPseq and ATACseq analysis pipelines (notably including data preprocessing, reads mapping, peak calling...).
Development of a web application and new functionalities for the maintenance and curation of iPPI-DB
A new version of the iPPI-DB, a manually curated database that contains the structure, some physicochemical characteristics, the pharmacological data and the profile of the PPI targets of several hundred modulators of protein-protein interactions.
This new version will include:
- A maintenance application that facilitates and automates the updates of the database. The computation of the various physico-chemical properties of the modulators and chemical similarity screening on the Galaxy server of the Institut Pasteur.
- A new target-centric mode, based on the mapping of all druggable cavities at the core of PPI interfaces throughout the Protein Data Bank.
Nous avons créer un programme sous R pur l'analyse non supervisé de fichier de cytométrie et nous avons besoin d'aide pour optimiser ce programme. Nous avons également besoin de conseils pour optimiser le clustering. Nous avons déjà rencontré Hugo Varet.
The ARIA (Ambiguous Restraints for Iterative Assignment) software, developed at the Structural Bioinformatics Unit, automatizes the treatment of NMR data and protein structure calculation by molecular dynamics simulation. To enhance the visibility of the software, it is necessary to develop a new web interface where users will be able to easily manage their data, perform calculations and analyze the results of the ARIA calculations.
Rationnel. Le mode de présentation clinique du sepsis est très polymorphe. Chez les patients septiques consultants dans les services des urgences, la présence d’une hyperthermie ou d’autres critères du syndrome de réponse inflammatoire systémique (SIRS) n’est pas suffisante pour aider au diagnostic de sepsis. De nombreux efforts de recherche ont abouti à la proposition d’innombrables biomarqueurs de sepsis essentiellement étudiés en soins intensifs. Même si certains, comme la procalcitonine (PCT) ont atteint un relativement bon degré de prédiction aux urgences, leur usage en routine demeure controversé. Compte tenu de la physiopathologie complexe du sepsis, une approche combinatoire pourrait permettre d’atteindre des performances difficilement envisageables avec un biomarqueur seul. Objectif primaire. Etudier les performances statistiques d’un panel de biomarqueurs d’intérêt, individuellement et en association, pour le diagnostic de sepsis aux urgences. Objectifs secondaires. Etudier les performances statistiques d’un panel de biomarqueurs d’intérêt, individuellement et en association, pour le diagnostic d’état septique grave (sepsis sévère et choc septique) et la stratification du risque (prédiction de l’admission en soins intensifs et/ou du décès). Type d’étude. Etude de cohorte monocentrique prospective non-interventionnelle Patients et critères d’inclusion. 300 patients consultant dans le service des urgences ayant une suspicion de sepsis + 30 sujets sains. Critères de non inclusion. Patient mineur de moins de 18 ans, femme enceinte, conditions de vie rendant impossible le suivi à 28 jours, refus de participer à l’étude. Mesures. Pour chaque patient, lors du bilan sanguin initial, prélèvement de 3 tubes pour le dosage a posteriori d’un panel de biomarqueurs d’intérêt explorant les différentes voies biologiques activées au cours du sepsis.
The goal of the project is to determine if there are differences in the midgut microbiome of our lab colonies of Aedes aegypti. We frequently observe various phenotypic differences between different colonies of mosquitoes and it is a recurring question whether these phenotypic differences are a result of differences in the microbiome. We will sequence the microbiome of 6 representative established lab colonies that have been collected from geographically diverse areas and compare the bacterial communities between the them. This data will help us dissect the importance that variation of the midgut microbiome of lab colonies of Aedes aegypti has on the phenotypic differences we observe in the lab.
Mood disorders such as bipolar and major depressive illnesses are among the most severe psychiatric disorders. They have high prevalence and chronic course, and are associated with significant mental and somatic comorbidities and high personal and societal costs (lost productivity and increased medical expenses). Patients with bipolar disorder (BD), for example, exhibit a reduced lifespan compared with the general population, a finding that cannot only be explained by high suicide risk, reduced access to medical care and lifestyle factors. However, the pathophysiological mechanisms of BD are poorly understood, and patients often have incomplete treatment response. Advanced mathematical approaches such as machine learning techniques are increasingly being used to generate predictions based on complex data, and it has been successfully used to detect a number of clinical outcomes and to predict behaviours. In combination with mobile technologies (e.g. smartphones, wearables) to collect behavioural, physiological and environmental data, these big data predictive approaches may provide a much richer and deeper understanding of phenomenology and pathophysiological mechanisms of mood and bipolar disorders. By taking advantage of the high-standard bioinformatics expertise offered by the C3BI, this multidisciplinary, collaborative project aims to explore how clinical and biological factors, may contribute for better characterizing BD patients as well as to identify predictors of treatment response in BD. Our project also aims to explore how daily behavioural and physiological parameters may influence mood and behaviour in individuals at-risk or suffering from mood disorders.
There exists a broad biodiversity inside the Listeria monocytogenes species, which can be summarized by the existence of evolutionary lineages and more than 100 clonal complexes (CCs or clones) based
DISCO-Bac (http://disco-bac.web.pasteur.fr/), a Web server, is a part of a recent publication https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3932-y (Co-authored by former Hub member
Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep-sequencing. Differential analyses of these maps facilitate downstre
Development and design of new functionalities for MEMHDX, a web application dedicated to the statistical analysis and vizualization of large HDX-MS datasets.
Hydrogen Deuterium eXchange followed by Mass Spectrometry (HDX-MS) is a recognized biophysical tool in structural biology capable of probing protein/ligand interactions, conformational changes, and pr
Providing correlationPlus software to the scientific community for analysis of dynamical correlations in biological macromolecules
Molecular dynamics simulations and elastic network models are two widely used computational methods for investigation of dynamics of biological macromolecules. These methods can reveal dynamical corre
Track Analyzer is Python-based data visualization pipeline for tracking data. It does not perform any tracking, but takes as input any kind of tracked data. It analyzes trajectories by computing stand
With an estimated 1031 particles on earth, bacteriophages are the most abundant genomic entities across all habitats and important drivers of microbial communities. Growing evidence suggest that they
Autism Spectrum Disorder (ASD), a disorder of social communication and restricted and stereotyped interests, represents a major societal challenge with its prevalence of 2.93% (Baio et al., 2018). Sin
We have recently developed and published our last version of iPPI-DB (https://ippidb.pasteur.fr/), our database of protein-protein interactions modulators. Thanks to the group of Hervé Ménager, the da
We have developed a computer tool, named InDeep, that relies on 3D fully convolutional neural networks to predict functional binding sites at the surface of proteins. These functional binding sites ca
Last year, we have developed the ARIAweb server for automated NMR structure calculation. The server was well received by the community (200+ users and ~1400 jobs performed) as of today. ARIAweb offers