Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Tool Development
Related people (2)
After graduating from Paris VI University with a PhD in Genetics on the “Role of histone protein post-translational modifications in splicing regulation” that I performed in the Epigenetic Regulation unit at the Institut Pasteur, I carried out two post-doctoral experiences. I first worked for three years as a postdoctoral associate of the Whitehead Institute for Biomedical Research/MIT in Cambridge (USA). My main project consisted in the integration of genomic and epigenomic data in order to predict the transcription factors that are potentially at the core of the regulation of the cell-type specific gene expression programs. I then joined the Institut Curie where I deepened my experience in multi-omics data analyses and integration to identify non-coding RNAs involved in cancer progression. I have recently joined the HUB-C3BI of the Institut Pasteur where I am performing high-throughput data integration to better understand biological complexity and contribute to precision medicine development.
ATAC-seqChIP-seqEpigenomicsNon coding RNAPathway AnalysisRNA-seqSingle CellSystems BiologyTool DevelopmentTranscriptomicsData integrationGraph theory and analysisCell biology and developmental biology
A computer scientist by training, I am applying this knowledge to solve biological problems and am particularly interested in modelling of biological systems, knowledge inference, ontologies and data visualisation.
AlgorithmicsData VisualizationMetabolomicsModelingPathway AnalysisPhylogeneticsSystems BiologyTool DevelopmentDatabaseProgram developmentScientific computingDatabases and ontologiesApplication of mathematics in sciencesSofware development and engineeringData and text miningEvolutionData integrationGraph theory and analysisWorkflow and pipeline developmentDiscrete and numerical optimization
VirusHuman Immunodeficiency virus (HIV)
- Modeling mitochondrial metabolism dormant Cryptococcus neoformans(Benjamin HOMMEL - Molecular Mycology) - In Progress
- Measles virus protein C interplay with cellular apoptotic pathways; applications for cancer treatment(Alice MEIGNIÉ - Viral Genomics and Vaccination) - In Progress
- Diffusion des mutations de résistance du VIH : modèles et méthodes d’estimation(Olivier GASCUEL - Evolutionary Bioinformatics) - In Progress
Related projects (29)
Candida albicans is responsible for the majority of life-threatening fungal infections occurring in hospitalized patients and is also the most frequently isolated fungal commensal of humans. The C. albicans population includes at least 18 phylogenetic groups (or clades). Specific phenotypes can distinguish isolates within a given clade from those in other clades and yet, the relationships between C. albicans natural genetic and phenotypic diversities have not been explored in depth. We have sequenced the diploid genomes of >150 C. albicans isolates selected from a collection of commensal/clinical isolates previously used to characterize the population structure and belonging to the 12 major C. albicans clades. The aim of this project is to develop the tools necessary for an in depth analysis of these genome sequences in order to allow us ask questions about the extent of C. albicans genetic diversity, the contribution of loss-of-heterozygosity to this diversity, and the history of C. albicans population.
Modification of MacSyFinder models and profiles data structure to facilitate models distribution, and integration in the Mobyle platform. This consists in organizing the way XML models and HMM profiles of given system or set of systems in order to allow running the detection of all the macromolecular systems in a given folder. This will also ensure that the profiles are always distributed with the appropriate definitions of the systems, and thus avoid collisions of profiles name.
Most of the national and international projects performed between our lab and the GFMI unit (Anavaj Sakuntabhai) are driven by data collected at different (molecular/cellular/clinical) levels. The increased use/reuse of multiple data sources, large-scale technologies and increasing cohort sizes make state-of-the-art data management and analysis on a project-by-project basis very labor-intensive. To ensure efficiency and quality of our data-rich research, we propose the creation of a Systems Biology data management and analysis platform that provides [a] extensible storage and retrieval of different linked datasets, [b] a standardized workflow system for documentation and efficient generation of reproducible results, [c] versioning of input data, intermediate and end results, software, and parameters. A good Driving Project to guide development and implementation is a LabEx IBEID-funded thesis project in which we apply novel regression algorithms to discover biomolecules associated with the severity of dengue infection. Once successful for the Driving Project, we plan to successively extend the platform to other joint projects. Other necessary features beyond extensions of [a]–[c] are [d] support for visualization of intermediate and end results, [e] a Web-based interface for usability by non-expert users, [f] support of provenance tracking. This project will significantly boost speed and quality of our joint research between experimentation and modeling, and will provide a model for other laboratories involved in data-rich integrative biology. "The development of tools and software platforms that allow the integration of large-scale, diverse data sets into complex models that can then be operated upon and refined by experimentalists in an iterative fashion is perhaps the most critical milestone we must reach in the biological sciences if large-scale data and results are to impact on biological research routinely at all levels." (Eric Schadt 2009, Nature 461, p 218)
Mapping of research themes and fields of expertise available in the Institut Pasteur international Network
Using data extracted from Pubmed, we would like to develop a tool for systematic analysis of research themes and fields of expertise available in the Institut Pasteur International Network (IPIN). The tool would be available to the Pasteur community and could be questioned using search terms in Pubmed, identifying articles involving research teams from IPIN and displaying the name of authors, research units, and location in a visual format. We hope this tool would enable researchers to identify colleagues for sharing expertise and developing collaborations.
When dealing with high depth read data, a simple way to associate accurate analyses to moderate computational resources is to extract a subset of raw reads that allows observing both homogeneous and moderate coverage depth. Unfortunately, current implementations are often unexpectedly slow and require many significant pre-processing of large files to be used in practice. In the current scientific context, much effort must go into algorithm design and efficient programming to process large data with reasonable running times. An efficient implementation should therefore be developed in order to quickly perform read coverage homogenization. Indeed, such tool will help to deal with highly redundant sequencing data by creating read subsets with useful properties. As the read coverage homogenization step is expected to be systematically used for pre-processing the large amount of raw reads generated in the PIBnet context, a development carried out by members of the CIB platform is expected to lead to efficient solutions that will take advantage of the computing resources hosted by the Institut Pasteur.
Whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by the above laboratories to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this large concerted effort, a priority is to promote WGS as the major characterization approach of microbial agents for surveillance and outbreak investigation. Our ambition is to have shifted to WGS as a routine strain characterization method for epidemiological surveillance and outbreak investigation in the Institut Pasteur NRCs at the end of 2016. The target volume is 10 000 genomes a year. On the bioinformatics level, this requires implementing (1) fast data treatment tools and (2) Genotyping/classification schemes and methods to extract medically relevant information from genomic sequences (resistome, virulome).
Improvement of two existing tools, COV2HTML (published in 2014) and SEQ2HTML (private), and transform two scripts into interactive web interfaces addressed to biologists.
The tool MacSyFinder and the viewer MacSyView provide a framework to model macromolecular systems in genomes, allow their precise detection, and their visualisation in genomic context. These tools were co-developped by Sophie Abby, Bertrand Néron, Hervé Ménager, Marie Touchon and Eduardo Rocha. MacSyFinder has been the subject of a demand for a US patent by the Institut Pasteur. The aim of the project is to increase the value and accessibility of these tools by designing a web interface, and giving the possibility to run MacSyFinder on a remote server in Pasteur, with a set of predefined model systems, or with user-supplied models. This will make these tools more easily accessible by a broad audience of biologists.
Integrons drive the spread of antibiotic resistance genes in bacterial populations. While their role is widely known (integron+antibiotic+resistance in PubMed gives > 1700 hits), there was no software available to identify integrons and their different components. We have made a script (in Python) which annotates DNA sequences (with prodigal), then identifies the integrase of the integron (using hmm protein profiles), searches attC sites (using covariance models from structurally annotated multiple sequence alignments of DNA sequences), searches certain types of promoters (string search using Python functions), and outputs all this and the genes in the cassettes (between attC sites). We would like the program to be available via Mobyle (or other convenient way) as a web server. Thus would facilitate its use by most biologists (which could still use the scripts if they wish to). It would also facilitate the publication of the methodological manuscript.
Dengue prevention relies primarily on controlling populations of the main mosquito vector, Aedes aegypti, which is failing in many parts of the world because of the lack of sustained commitment of resources and ineffective implementation. Novel entomological approaches to dengue control are being developed that aim at replacing or suppressing mosquito vector populations. Insufficient genomic resources for Ae. aegypti, however, have until now impeded progress in both basic and applied research on this medically important mosquito species. The only available reference genome for Ae. aegypti is a draft that consists of over 4,800 unassembled fragments with incomplete annotation. Moreover, the inbred Ae. aegypti laboratory strain that was sequenced does not universally represent the considerable genetic and ecological diversity of the species worldwide. The large size of the genome and its high content in repeat-rich sequences of transposable elements was a major difficulty to assemble the Ae. aegypti genome sequence. In the present project, we aim to overcome this difficulty using a novel strategy for genome sequencing and assembly. The ultimate goal is to produce several, fully assembled, well-annotated, new Ae. aegypti reference genomes from epidemiologically relevant populations. The expected outcome is a genome reference panel including a catalog of species-wide genetic variation that will significantly improve genomic resources for Ae. aegypti research and help address a broad range of biological questions related to Ae. aegypti vectorial capacity and dengue virus transmission.
Pasteur International Bioresources Network (PIBnet) bioinformatics: whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by 15 National Reference Centers (NRC), 8 Collaborative Centers of World Health Organization, the Collection de l’Institut Pasteur & Cyanobacteria collection and the CIBU to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this large concerted effort, a priority is to promote whole genome sequencing (WGS) as the major characterization approach of microbial agents for surveillance and outbreak investigation. Our ambition is to have shifted to WGS as a routine strain characterization method for epidemiological surveillance and outbreak investigation in the Institut Pasteur at the end of 2016. The target volume is 10,000 genomes a year. On the bioinformatics level, this requires implementing fast data treatment tools, databases, genotyping schemes and methods to extract medically relevant information from genomic sequences (resistome, virulome).
A long-term mission for an assigned CIH-embedded bioinformatician to provide bioinformatic support to the CIH community
The Center for Human Immunology (CIH) supports researchers involved in translational research projects by providing access to 16 different cutting edge technologies. Currently, the CIH hosts over 60 scientific projects coming from 8 departments of the Institut Pastuer and 5 external teams. In order to respond to the growing needs of these projects in the area of single cell analysis, the CIH has introduced a significant number of single-cell/single-molecule technologies over the past 2-3 years. These new technologies, such as the Personal Genome Machine (PGM) and Ion Proton sequencers, iSCAN microarray scanner, Nanostring technology for transcriptomics profiling and real-time PCR machine BioMark, give rise to large datasets with high dimensionality. Such trend, in terms of data complexity, is also true for flow cytometry technologies (currently reaching over 20 parameters per cell). The exploration of this data is generally beyond the scope of scientists involved in translational research projects. In order to maximize the research outcomes obtained from the analysis of these rich datasets, and to ensure that the full potential of our technologies can be served to the users of the CIH, we would require a proximity bioinformatics support. A CIH-embedded bioinformatician would: 1) design and implement standard analysis pipelines for each of the data-rich technologies of the CIH; 2) provide regular ‘bioinformatics clinics’ to allow scientists the possibility to customize standard pipelines to their specific needs; 3) run trainings on the ‘R software’ platform and other data analysis tools (such as Qlucore) of interest for the CIH users. The objective would be to empower the users to run exploratory analysis by themselves, and to teach good practices in terms of data management and data analysis.
Over the past three decades Listeria has become a model organism for host-pathogen interactions, leading to critical discoveries in a broad range of fields including virulence-factor regulation, cell biology, and bacterial pathophysiology. More recently, the number of Listeria “omics” data produced has increased exponentially, not only in term of number, but also in term of heterogeneity of data. There are now more than 40 published Listeria genomes, around 400 different transcriptomics data and 10 proteomics studies available. The capacity to analyze these data through a systems biology approach and generate tools for biologists to analyze these data themselves is a challenge for bioinformaticians. To tackle these challenges we are developing a web-based platform named Listeriomics which integrates different type of tools for “omics” data manipulation, the two most important being: 1) a genome viewer for displaying gene expression array, tiling array, and RNASeq data along with proteomics and genomics data. 2) An expression atlas, which is a query based tool which connects every genomics elements (genes, smallRNAs, antisenseRNAs) to the most relevant “omics” data. Our platform integrates already all genomics, and transcriptomics data ever published on Listeria and will thus allow biologists to analyze dynamically all these data, and bioinformaticians to have a central database for network analysis. Finally, it has been used already several times in our laboratory for different types of studies, including transcriptomics analysis in different biological conditions, and whole genome analysis of Listeria proteins N-termini. This project is funded by an ANR Investissement d'avenir: BACNET 10-BINF-02-01
Identification of new or unexpected pathogens, including viruses, bacteria, fungi and parasites associated with acute or progressive diseases
Microbial discovery remains a challenging task for which there are a lot of unmet medical and public health needs. Deep sequencing has profoundly modified this field, which can be summarized in two questions : i) which pathogens or association of pathogens are associated with diseases of unknown etiology and ii) among microbes infecting animal (including arthropod) reservoirs, which ones are able to infect large vertebrates, including humans. We are currently addressing these two questions and our current request comes with the willingness for Institut Pasteur to increase its contribution and visibility of this thematic, in particular in relation with hospitals and the Institut Pasteur International network (IPIN). We expect to identify new microbes associated with human diseases, and this is expected to pave the way for basic research programs focusing on virulence mechanisms and host specificity, and will also lead to phylogenetic and epidemiological studies (frequency of host infection, mode of transmission etc...), as well as the development of improved diagnostic tests for human infections. Our objective is also to contribute to the efforts of Institut Pasteur in the field of infectious diseases, by building a pipeline, from sample to microbial identification, able to manage large cohorts of samples. This project is currently supported by the LABEX IBEID and the CITECH, and critically requires a bioIT support, justifying this application. Partners include different hospitals including Necker-Enfants malades University Hospital regarding patients with progressive disease, different IPIN laboratories, as well as INRA and CIRAD regarding animal/arthropod reservoirs.
Rapid and accurate identification of microorganisms is a prerequisite for appropriate patient care and infection control. In the last decade, Mass Spectrometry (MS) has revolutionized the field of clinical microbiology with the introduction of MALDI-TOF for rapid microbial identification. However, MALDI-TOF MS suffers from important limitations. Some bacteria remain difficult to identify, either because they do not give a specific profile or because the database lacks the appropriate reference. In addition, the discriminatory power of the technique is often insufficient for reliably differentiating sub-species within species or clones within sub-species. More importantly, virulence or resistance determinants cannot be characterized, which is a severe obstacle for appropriate patient care and antibiotics prescription in hospitals. In recent years, proteomics approaches have been increasingly used to study host-pathogen interactions. State-of-the-art bottom-up approaches rely on the enzymatic digestion of proteins and LC-MS/MS analysis of peptides. In contrast, top-down proteomics is an emerging technology based on the analysis of intact proteins by high-resolution mass spectrometry. The major advantage of top-down proteomics is its ability to address protein variations and characterize proteoforms arising from alternative splicing, allelic variation, or post-translational modification. We have recently set-up a robust top-down proteomics platform for the analysis of intact bacterial proteomes. Our final objective is to use this platform to better characterize bacterial pathogens in a clinical context, but a major requirement to achieve this goal is to build up accurate bacterial proteoform databases.
Integrons drive the spread of antibiotic resistance genes in bacterial populations. While their role is widely known (integron+antibiotic+resistance in PubMed gives > 1700 hits), there was no software available to identify integrons and their different components. We have made integronFinder (with Bertrand Néron) for this (Cury, NAR, 16). We would like to make improvements to IntegronFinder:
- Unit tests to facilitate the development of the tool. Especially since Jean Cury (the main author) will leave the lab in the summer of 2017.
- Add the possibility of using multi-fasta as input.
A novel MacSyFinder module for detection of bacterial capsule systems on the future Galaxy platform.
Extracellular capsules constitute the outermost layer of some bacteria and establish the first contact between the cell and its environment. They are major virulence factors involved, amongst other, in antibiotic tolerance and immune escape. Bacterial capsules are also known to mediate unspecific cell-to-cell interactions between and across species and affect population structure and dynamics. Despite the great number of studies characterizing capsule diversity and their role during infection in a few model species, most evolutionary and ecological questions have yet to be adressed. Furthermore, there is no tool to identify and annotate capsule systems across all prokaryotic genomes. We have built such a tool and would like to make it available.
In recent years, large genome-wide association studies (GWAS) have been successful in identifying thousands of significant genetic associations for multiple traits and diseases1. In the course of this endeavor, sample size has proven to be the key factor for identifying new variants. For example, GWAS of body mass index (BMI), now including up to 350,000 individuals from more than 100 cohorts, have been able to identify genetic variant that explain as low as 0.02% of BMI variance2. While standard approaches for detecting new genetic variants associated with traits and diseases will go on as sample size increases, multivariate analyses have been proposed as an alternative strategy for both improving detection of new variants and exploring the multidimensional components of complex traits and diseases. Intuitively, multivariate analysis can be used to improve detection of variants displaying a pleiotropic effect3 by accumulating moderate evidence of association across multiple traits and diseases. Several recent examples have been published about not only GWAS hit overlap across related traits4, but also of genome-wide shared genetic effect5. Multivariate analyses of GWAS have also proven useful to understand shared genetics between diseases5, and potential causal relationship between phenotypes using Mendelian randomization (MR)6. Importantly, most of existing multivariate methods are based on GWAS summary statistics, while approaches based on individual-level data have been seldom considered because of major practical and ethical issues. In the continuity of ongoing work on multi-phenotype analysis (Aschard et al 20147, Aschard et al 20158), we developed an effective and robust multivariate approach of GWAS summary statistics that addresses the major barriers of existing approaches, i.e. the presence of correlation between studies that would exists when GWAS analyzed share sample9-16. Our approach consists in a robust omnibus multivariate test of GWAS summary statis
Notre objectif est d'étudier comment la recombinaison génétique génère de la diversité au sein des écosystèmes d'entérovirus.
Extracellular capsules constitute the outermost layer of some bacteria and establish the first contact between the cell and its environment. They are major virulence factors involved, amongst other, in antibiotic tolerance and immune escape. Bacterial capsules are also known to mediate unspecific cell-to-cell interactions between and across species and affect population structure and dynamics. Despite the great number of studies characterizing capsule diversity and their role during infection in a few model species, most evolutionary and ecological questions have yet to be adressed. Furthermore, there is no tool to identify and annotate capsule systems across all prokaryotic genomes. Our previous project led to the installation of our tool on the Galaxy Server. The goal now is to put the data of our survey (>2000 genomes) for the capsule available to the community.
CRISPR polymorphism is a powerful tool to subtype Salmonella strains and is now used in routine for epidemiological investigations. The aim of this project is to transfer and upgrade a published and worldwide used webtool to extract the spacer content from fasta sequences or paired-end reads.
CRISPR-Cas systems provide immunity to bacteria and archaea. One of the reasons these systems have attracted so much attention in the past few years is due to the discovery of nucleases among the Cas proteins that are guided by small RNAs to bind and degrade homologous DNA. The introduction of breaks in DNA that can be repaired either in a controlled or uncontrolled manner now is a widely used method to introduce mutations in genomes. We are interested in probing the CRISPR-Cas system efficiency for different targets.
To study the neuronal mechanisms underlying the generation of distinct memories, it is necessary to perform experiments in which the sensory elements of the environment are under the precise control o
Disruption of GABAergic inhibitory circuits is one of the common alteration responsible for several psychiatric developmental disorders. Gephyrin (GPHN) is the common and main molecular organizer of i