Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Non applicable
Related people (6)
Since September 2016, I am a research engineer in the Bioinformatics and Biostatistics HUB of the Institut Pasteur and detached in the Proteomics facility. I have a PhD in Signal Processing from the Ecole Nationale Supérieure des Télécommunications de Bretagne (Telecom Bretagne) and a Master in Mathematics with a specialty in Statistical Engineering from Rennes 1 University. After my PhD, I was a research and teaching assistant in Mathematics at the Institut National des Sciences Appliquées (INSA) of Rennes, then I worked as a consultant for public local authorities in the company Ressources Consultants Finances. I started working in the field of Proteomics in October 2014 in the EDyP laboratory located in Grenoble (http://www.edyp.fr/). I have been working on the improvement of statistical analysis of bottom-up proteomics data. Today, most of the projects I work on consist of detecting changes in protein abundances using discovery-driven mass spectrometry. I am interested in the development of new methodologies to optimize proteomics data analysis pipelines, from the identification of peptides/proteins to their quantification and the interpretation of results. For this purpose, I worked on several R packages which can be downloaded from the CRAN and Bioconductor: cp4p (https://cran.r-project.org/web/packages/cp4p/index.html), imp4p (https://cran.r-project.org/web/packages/imp4p/index.html), DAPAR (http://bioconductor.org/packages/release/bioc/html/DAPAR.html) and its GUI ProStar.
Machine learningModelingPathway AnalysisProteomicsStatistical inferenceBiostatisticsApplication of mathematics in sciencesData and text miningData integrationStatistical experiment designMultidimensional data analysis
After a Master degree in Genome Analysis and Molecular Modeling at Denis Diderot University, I did a PhD in NMR / bioinformatics at Denis Diderot University, where I worked on the development and use of a software named DaDiModO which uses SAXS data and RDC/NMR data to calculate models of structural proteins. After a postdoc aiming to adapt ARIA software to allow execution on computing grid in the Structural Bioinformatic Team at Institut Pasteur in collaboration with IBCP, I joined CIB/DSI Team where I was responsible for the development of bioinformatics projects and the deployment, maintenance and evolution of the Pasteur Galaxy server. I joined the Hub/C3BI team in 2017 as research engineer where I’m involved in several projects such as structural bioinformatics, softwares and web development. I am also in charge of the maintenance of the Galaxy Pasteur instance.
Data managementStructural bioinformaticsDatabaseProgram developmentScientific computingDatabases and ontologiesGrid and cloud computing
- Intégration d'outils bioinformatique dans Galaxy pour identification bactérienne(ANNE LE FLECHE - Department of Infection & Epidemiology,Environment and Infectious Risks) - In Progress
- Identification of APOBEC3 mutations in cancer genoms(Vincent CAVAL - Molecular Retrovirology) - In Progress
- Assembly of insect virus genome(Karin EIGLMEIER - Genetic and Genomics of Insects Vectors) - In Progress
Activities Contact for any subject related to IFB. Help scientists to develop new tools (architecture, design, implementation). animate the Python Working Group at pasteur . O|B|F (http://www.open-bio.org/) member. Skills Strong programming experience in Python. Software architecture and design. NoSQL DataBase (MongoDB, CouchDB) XML/YAML continuous integration (github/travis-CI/readthedocs, gitlab/gitlab-CI) containers (Docker, Singularity) linux (Gentoo, Xubuntu) IFB developer Main projects on the campus Mobyle http://Mobyle.pasteur.fr Mobyle: a new full web bioinformatics framework IntegronFinder (ongoing project) MacsyFinder (ongoing project) githubaccess to my projects on github Teaching Unix (Unix-I , Unix-II) Python . Education 2002 Phd in Molecular and cellular biology. “Rôle de deux protéines QN1 et PATF impliquées dans l’arrêt de prolifération des cellules de la neurorétine aviaire au cours du developpement”. 2001 “Informatique En Biologie” course (Pasteur)
Data managementDatabaseProgram developmentScientific computingDatabases and ontologies
- Move of the DISCO-BAC server VM to the new DSI infrastructure(Benno SCHWIKOWSKI - Systems Biology) - Closed
- Genetic and statistical analysis of data produced with the Collaborative Cross at the Institut Pasteur(Xavier MONTAGUTELLI - Mouse Genetics) - In Progress
- MacSyDBCapsule(Eduardo ROCHA - Microbial Evolutionary Genomics) - Closed
Related projects (19)
The CRISPR/Cas9 technology is a recent breakthrough in rapid genetic editing. A major part of getting the technology to work is the proper design of a guide RNA that will help Cas9 target specific genomic sequences. The design of this guide RNA must take into account all possible matches along an organism’s genome with as little as 50% similarity. Such a high toleration for error means that current alignment algorithms are not well suited to the task. This issue leads to suboptimal guide RNA design and/or lengthy periods of the design process. It is a problem that is exacerbated when considering CRISPR/Cas9 for high throughput applications. The development of a new brand of sequence comparison algorithms is required.
Nous recherchons un logiciel basé ou pas sur spade qui tourne sous R qui permet de faire une analyse des fichiers de cytométrie de type FCS par clustering multiparamétrique. nous serions également intéressé par visne également. les fichiers peuvent être transformé en CSV Il existe un logiciel commercial Cytobank mais qui ne rempli pas toute les fonctions souhaité.
Mapping of research themes and fields of expertise available in the Institut Pasteur international Network
Using data extracted from Pubmed, we would like to develop a tool for systematic analysis of research themes and fields of expertise available in the Institut Pasteur International Network (IPIN). The tool would be available to the Pasteur community and could be questioned using search terms in Pubmed, identifying articles involving research teams from IPIN and displaying the name of authors, research units, and location in a visual format. We hope this tool would enable researchers to identify colleagues for sharing expertise and developing collaborations.
When dealing with high depth read data, a simple way to associate accurate analyses to moderate computational resources is to extract a subset of raw reads that allows observing both homogeneous and moderate coverage depth. Unfortunately, current implementations are often unexpectedly slow and require many significant pre-processing of large files to be used in practice. In the current scientific context, much effort must go into algorithm design and efficient programming to process large data with reasonable running times. An efficient implementation should therefore be developed in order to quickly perform read coverage homogenization. Indeed, such tool will help to deal with highly redundant sequencing data by creating read subsets with useful properties. As the read coverage homogenization step is expected to be systematically used for pre-processing the large amount of raw reads generated in the PIBnet context, a development carried out by members of the CIB platform is expected to lead to efficient solutions that will take advantage of the computing resources hosted by the Institut Pasteur.
Improvement of two existing tools, COV2HTML (published in 2014) and SEQ2HTML (private), and transform two scripts into interactive web interfaces addressed to biologists.
Hydrogen deuterium exchange detected by mass spectrometry (HDX-MS) is a powerful technique to probe the conformation and dynamics of proteins. Over the past 10 years, the HDX-MS workflow has been optimized and automatized leading to a rapid expansion of the technology in both academic lab and pharmaceutical companies. Thanks to these improvements, modern HDX-MS can be applied to investigate more complex biological systems, including large protein complexes and membrane proteins. However, the higher the size of the protein under study, the more complex the HDX-MS data. Several noncommercial and commercial software solutions have been developed to help for the analysis of HDX-MS data. We are currently using DynamX 3.0 that is a Waters-specific product specifically designed for the nanoACQUITY UPLC system with HDX technology. The aim of the project is to design and implement a statistical tool compatible with the output generated by DynamX to read ily validate results obtained with large protein complexes.
DNA topoisomerase IB (Topo IB) enzymes are ubiquitous in eukaryotes, where they represent the major DNA topoisomerase I activity. However, Topo IB sequences are also found in other phyla, such as archaea and bacteria, as well as viruses. Given the large amount of sequenced data available in public databases, this project aims to infer a robust Topo IB gene tree based on a representative set of homologous sequences gathered from a large taxonomic sample.
Mise a disposition d'un(e) bioinformaticien(ne) du hub pour les analyses bioinformatiques du transcriptome et de l epigenome
La PF Transcriptome et Epigenome développe des projets de séquençage à haut débit (collaboration et service) avec des équipes du Campus. Ceux-ci couvrent l'ensemble des thématiques du campus ainsi qu'une large gamme d'organismes (des virus aux mammifères). La plate-forme exerce des activités de biologie humide (construction des librairies et séquençage) et de biologie sèche (analyse bioinformatiques et statistiques). La personne mise a disposition interagira étroitement avec les autres bioinformaticiens du pôle BioMics et du Hub. Ses activités concerneront notamment: - La participation à la conception et à la mise en place des projets avec les équipes demandeuses, la prise en charge des analyses et le reporting aux utilisateurs - La mise en place d'un workflow d'analyse bioinformatique des données de transcriptome /épigénome en étroite collaboration avec le C3BI, la DSI et les autres bioinformaticiens du pole. Ce workflow permettra le contrôle qualité des données, leur prétraitement, le mapping des séquences sur les génomes/transcriptomes de réference, et le comptage des reads pour les différents éléments de l'annotation - L'adaptation du workflow d'analyse aux questions biologiques et aux organismes étudiés dans le cadre des activités de la PF - L'activité de veille technologique et bibliographique (test et validation de nouveaux outils d'analyse, updates d'outils existants...) - La mise en place et le développement d'outils d'analyse adaptés aux futurs projets de la PF: single cell RNAseq, métatranscriptome, ChIPseq, analyse des isoformes de splicing.. Ceci se fera notamment via la réalisation d'analyses dédiées avec certains utilisateurs. Les outils mis en place et validés dans ce cadre seront ensuite utilisés pour l'ensemble des projets. - L'activité de communication et de formation (participation aux réunions du consortium France Génomique,formation permanente à l' Institut Pasteur… - la participation a d autres projets du Pole BioMics (selon disponibilité) Bernd Jagla, qui était le bioinformaticien de la plateforme a rejoint le Hub au 1er janvier 2016. Rachel Legendre est mise a disposition depuis le 2 novembre 2015 et remplace Bernd Jagla. Je souhaite que Rachel Legendre soit mise à disposition de la plateforme pour une durée d'au moins 2 ans.
Development and use of statistical programs to analyze RNA-Seq data produced at the Transcriptome & Epigenome Platform
The Transcriptome & Epigenome Platform is dedicated to the development and use of high throughput approaches for transcriptomics and epigenomics studies. The platform is accessible to any research team from the Pasteur Institute (80% of the projects) as well as from outside. It is involved (most often as collaborator) in several projects funded by the ANR, AVIESAN and by the Pasteur Institute in the framework of the PTR programs. Next Generation Sequencing (NGS) based on the Illumina technology (HiSeq 2000/2500 sequencers) is used to perform RNA-sequencing experiments for which a large amount of data is generated. After a first step involving bioinformatics, specific statistical methods must be used be analyze rigorously the data. These analyses are most often performed by the statistician(s) of the platform. They are also in charge of bibliographical survey activity.
Development and use of statistical programs to analyze RNA-Seq data produced at the Transcriptome & Epigenome Platform
The Transcriptome & Epigenome Platform is dedicated to the development and use of high throughput approaches for transcriptomics and epigenomics studies. The platform is accessible to any research team from the Pasteur Institute (80% of the projects) as well as from outside. It is involved (most often as collaborator) in several projects funded by the ANR, Microbes and Brain, ERANET and by the Pasteur Institute in the framework of the PTR programs. Next Generation Sequencing (NGS) based on the Illumina technology (HiSeq 2000/2500 sequencers) is used to perform RNA-sequencing experiments for which a large amount of data is generated. After a first step involving bioinformatics, specific statistical methods must be used be analyze rigorously the data. These analyses are most often performed by the statistician(s) of the platform. They are also in charge of bibliographical survey activity.
The Transcriptome and EpiGenome platform has a strong expertise in the bioinformatical and statistical analysis of RNA seq data. Nevertheless, we have more and more demands for the use of NGS to characterize the epigenome (using ChIPseq approach) or chromatine accessibility (by ATAC-seq) .We thus need to further develop and validate analysis workflows for these types of data. This project aims at developing and formalizing collaboration between the platform and some experts in this field at the hub. This would include: joint project kic-off meetings, development and validation of ChIPseq and ATACseq analysis pipelines (notably including data preprocessing, reads mapping, peak calling...).
Development of a web application and new functionalities for the maintenance and curation of iPPI-DB
A new version of the iPPI-DB, a manually curated database that contains the structure, some physicochemical characteristics, the pharmacological data and the profile of the PPI targets of several hundred modulators of protein-protein interactions.
This new version will include:
- A maintenance application that facilitates and automates the updates of the database. The computation of the various physico-chemical properties of the modulators and chemical similarity screening on the Galaxy server of the Institut Pasteur.
- A new target-centric mode, based on the mapping of all druggable cavities at the core of PPI interfaces throughout the Protein Data Bank.
Nous avons créer un programme sous R pur l'analyse non supervisé de fichier de cytométrie et nous avons besoin d'aide pour optimiser ce programme. Nous avons également besoin de conseils pour optimiser le clustering. Nous avons déjà rencontré Hugo Varet.
The ARIA (Ambiguous Restraints for Iterative Assignment) software, developed at the Structural Bioinformatics Unit, automatizes the treatment of NMR data and protein structure calculation by molecular dynamics simulation. To enhance the visibility of the software, it is necessary to develop a new web interface where users will be able to easily manage their data, perform calculations and analyze the results of the ARIA calculations.
The goal of the project is to determine if there are differences in the midgut microbiome of our lab colonies of Aedes aegypti. We frequently observe various phenotypic differences between different colonies of mosquitoes and it is a recurring question whether these phenotypic differences are a result of differences in the microbiome. We will sequence the microbiome of 6 representative established lab colonies that have been collected from geographically diverse areas and compare the bacterial communities between the them. This data will help us dissect the importance that variation of the midgut microbiome of lab colonies of Aedes aegypti has on the phenotypic differences we observe in the lab.
Mood disorders such as bipolar and major depressive illnesses are among the most severe psychiatric disorders. They have high prevalence and chronic course, and are associated with significant mental and somatic comorbidities and high personal and societal costs (lost productivity and increased medical expenses). Patients with bipolar disorder (BD), for example, exhibit a reduced lifespan compared with the general population, a finding that cannot only be explained by high suicide risk, reduced access to medical care and lifestyle factors. However, the pathophysiological mechanisms of BD are poorly understood, and patients often have incomplete treatment response. Advanced mathematical approaches such as machine learning techniques are increasingly being used to generate predictions based on complex data, and it has been successfully used to detect a number of clinical outcomes and to predict behaviours. In combination with mobile technologies (e.g. smartphones, wearables) to collect behavioural, physiological and environmental data, these big data predictive approaches may provide a much richer and deeper understanding of phenomenology and pathophysiological mechanisms of mood and bipolar disorders. By taking advantage of the high-standard bioinformatics expertise offered by the C3BI, this multidisciplinary, collaborative project aims to explore how clinical and biological factors, may contribute for better characterizing BD patients as well as to identify predictors of treatment response in BD. Our project also aims to explore how daily behavioural and physiological parameters may influence mood and behaviour in individuals at-risk or suffering from mood disorders.
There exists a broad biodiversity inside the Listeria monocytogenes species, which can be summarized by the existence of evolutionary lineages and more than 100 clonal complexes (CCs or clones) based
DISCO-Bac (http://disco-bac.web.pasteur.fr/), a Web server, is a part of a recent publication https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3932-y (Co-authored by former Hub member