Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise

Search by keywords | Search by organisms

Searched keyword : Tool Development

Related people (2)


Group : DETACHED - Detached : Labex milieu intérieur

After graduating from Paris VI University with a PhD in Genetics on the “Role of histone protein post-translational modifications in splicing regulation” that I performed in the Epigenetic Regulation unit at the Institut Pasteur, I carried out two post-doctoral experiences. I first worked for three years as a postdoctoral associate of the Whitehead Institute for Biomedical Research/MIT in Cambridge (USA). My main project consisted in the integration of genomic and epigenomic data in order to predict the transcription factors that are potentially at the core of the regulation of the cell-type specific gene expression programs. I then joined the Institut Curie where I deepened my experience in multi-omics data analyses and integration to identify non-coding RNAs involved in cancer progression. I have recently joined the HUB-C3BI of the Institut Pasteur where I am performing high-throughput data integration to better understand biological complexity and contribute to precision medicine development.

ATAC-seqChIP-seqEpigenomicsNon coding RNAPathway AnalysisRNA-seqSingle CellSystems BiologyTool DevelopmentTranscriptomicsData integrationGraph theory and analysisCell biology and developmental biology
Projects (1)

Related projects (35)

Systems Biology Data Management and Analysis Platform (SYSMAP)

Most of the national and international projects performed between our lab and the GFMI unit (Anavaj Sakuntabhai) are driven by data collected at different (molecular/cellular/clinical) levels. The increased use/reuse of multiple data sources, large-scale technologies and increasing cohort sizes make state-of-the-art data management and analysis on a project-by-project basis very labor-intensive. To ensure efficiency and quality of our data-rich research, we propose the creation of a Systems Biology data management and analysis platform that provides [a] extensible storage and retrieval of different linked datasets, [b] a standardized workflow system for documentation and efficient generation of reproducible results, [c] versioning of input data, intermediate and end results, software, and parameters. A good Driving Project to guide development and implementation is a LabEx IBEID-funded thesis project in which we apply novel regression algorithms to discover biomolecules associated with the severity of dengue infection. Once successful for the Driving Project, we plan to successively extend the platform to other joint projects. Other necessary features beyond extensions of [a]–[c] are [d] support for visualization of intermediate and end results, [e] a Web-based interface for usability by non-expert users, [f] support of provenance tracking. This project will significantly boost speed and quality of our joint research between experimentation and modeling, and will provide a model for other laboratories involved in data-rich integrative biology. "The development of tools and software platforms that allow the integration of large-scale, diverse data sets into complex models that can then be operated upon and refined by experimentalists in an iterative fashion is perhaps the most critical milestone we must reach in the biological sciences if large-scale data and results are to impact on biological research routinely at all levels." (Eric Schadt 2009, Nature 461, p 218)

Project status : Declined

A reference panel of dengue vector genomes

Dengue prevention relies primarily on controlling populations of the main mosquito vector, Aedes aegypti, which is failing in many parts of the world because of the lack of sustained commitment of resources and ineffective implementation. Novel entomological approaches to dengue control are being developed that aim at replacing or suppressing mosquito vector populations. Insufficient genomic resources for Ae. aegypti, however, have until now impeded progress in both basic and applied research on this medically important mosquito species. The only available reference genome for Ae. aegypti is a draft that consists of over 4,800 unassembled fragments with incomplete annotation. Moreover, the inbred Ae. aegypti laboratory strain that was sequenced does not universally represent the considerable genetic and ecological diversity of the species worldwide. The large size of the genome and its high content in repeat-rich sequences of transposable elements was a major difficulty to assemble the Ae. aegypti genome sequence. In the present project, we aim to overcome this difficulty using a novel strategy for genome sequencing and assembly. The ultimate goal is to produce several, fully assembled, well-annotated, new Ae. aegypti reference genomes from epidemiologically relevant populations. The expected outcome is a genome reference panel including a catalog of species-wide genetic variation that will significantly improve genomic resources for Ae. aegypti research and help address a broad range of biological questions related to Ae. aegypti vectorial capacity and dengue virus transmission.

Project status : Closed

A long-term mission for an assigned CIH-embedded bioinformatician to provide bioinformatic support to the CIH community

The Center for Human Immunology (CIH) supports researchers involved in translational research projects by providing access to 16 different cutting edge technologies. Currently, the CIH hosts over 60 scientific projects coming from 8 departments of the Institut Pastuer and 5 external teams. In order to respond to the growing needs of these projects in the area of single cell analysis, the CIH has introduced a significant number of single-cell/single-molecule technologies over the past 2-3 years. These new technologies, such as the Personal Genome Machine (PGM) and Ion Proton sequencers, iSCAN microarray scanner, Nanostring technology for transcriptomics profiling and real-time PCR machine BioMark, give rise to large datasets with high dimensionality. Such trend, in terms of data complexity, is also true for flow cytometry technologies (currently reaching over 20 parameters per cell). The exploration of this data is generally beyond the scope of scientists involved in translational research projects. In order to maximize the research outcomes obtained from the analysis of these rich datasets, and to ensure that the full potential of our technologies can be served to the users of the CIH, we would require a proximity bioinformatics support. A CIH-embedded bioinformatician would: 1) design and implement standard analysis pipelines for each of the data-rich technologies of the CIH; 2) provide regular ‘bioinformatics clinics’ to allow scientists the possibility to customize standard pipelines to their specific needs; 3) run trainings on the ‘R software’ platform and other data analysis tools (such as Qlucore) of interest for the CIH users. The objective would be to empower the users to run exploratory analysis by themselves, and to teach good practices in terms of data management and data analysis.    

Project status : In Progress

Listeriomics - Development of a web platform for visualization and analysis of Listeria omics data

Over the past three decades Listeria has become a model organism for host-pathogen interactions, leading to critical discoveries in a broad range of fields including virulence-factor regulation, cell biology, and bacterial pathophysiology. More recently, the number of Listeria “omics” data produced has increased exponentially, not only in term of number, but also in term of heterogeneity of data. There are now more than 40 published Listeria genomes, around 400 different transcriptomics data and 10 proteomics studies available. The capacity to analyze these data through a systems biology approach and generate tools for biologists to analyze these data themselves is a challenge for bioinformaticians. To tackle these challenges we are developing a web-based platform named Listeriomics which integrates different type of tools for “omics” data manipulation, the two most important being: 1) a genome viewer for displaying gene expression array, tiling array, and RNASeq data along with proteomics and genomics data. 2) An expression atlas, which is a query based tool which connects every genomics elements (genes, smallRNAs, antisenseRNAs) to the most relevant “omics” data. Our platform integrates already all genomics, and transcriptomics data ever published on Listeria and will thus allow biologists to analyze dynamically all these data, and bioinformaticians to have a central database for network analysis. Finally, it has been used already several times in our laboratory for different types of studies, including transcriptomics analysis in different biological conditions, and whole genome analysis of Listeria proteins N-termini. This project is funded by an ANR Investissement d'avenir: BACNET  10-BINF-02-01

Project status : Closed

Identification of new or unexpected pathogens, including viruses, bacteria, fungi and parasites associated with acute or progressive diseases

Microbial discovery remains a challenging task for which there are a lot of unmet medical and public health needs. Deep sequencing has profoundly modified this field, which can be summarized in two questions : i) which pathogens or association of pathogens are associated with diseases of unknown etiology and ii) among microbes infecting animal (including arthropod) reservoirs, which ones are able to infect large vertebrates, including humans. We are currently addressing these two questions and our current request comes with the willingness for Institut Pasteur to increase its contribution and visibility of this thematic, in particular in relation with hospitals and the Institut Pasteur International network (IPIN).  We expect to identify new microbes associated with human diseases, and this is expected to pave the way for basic research programs focusing on virulence mechanisms and host specificity, and will also lead to phylogenetic and epidemiological studies (frequency of host infection, mode of transmission etc...), as well as the development of improved diagnostic tests for human infections. Our objective is also to contribute to the efforts of Institut Pasteur in the field of infectious diseases, by building a pipeline, from sample to microbial identification, able to manage large cohorts of samples. This project is currently supported by the LABEX IBEID and the CITECH, and critically requires a bioIT support, justifying this application. Partners include different hospitals including Necker-Enfants malades University Hospital regarding patients with progressive disease, different IPIN laboratories, as well as INRA and CIRAD regarding animal/arthropod reservoirs.

Project status : In Progress

Development of top-down proteomics for clinical microbiology

Rapid and accurate identification of microorganisms is a prerequisite for appropriate patient care and infection control. In the last decade, Mass Spectrometry (MS) has revolutionized the field of clinical microbiology with the introduction of MALDI-TOF for rapid microbial identification. However, MALDI-TOF MS suffers from important limitations. Some bacteria remain difficult to identify, either because they do not give a specific profile or because the database lacks the appropriate reference. In addition, the discriminatory power of the technique is often insufficient for reliably differentiating sub-species within species or clones within sub-species. More importantly, virulence or resistance determinants cannot be characterized, which is a severe obstacle for appropriate patient care and antibiotics prescription in hospitals. In recent years, proteomics approaches have been increasingly used to study host-pathogen interactions. State-of-the-art bottom-up approaches rely on the enzymatic digestion of proteins and LC-MS/MS analysis of peptides. In contrast, top-down proteomics is an emerging technology based on the analysis of intact proteins by high-resolution mass spectrometry. The major advantage of top-down proteomics is its ability to address protein variations and characterize proteoforms arising from alternative splicing, allelic variation, or post-translational modification. We have recently set-up a robust top-down proteomics platform for the analysis of intact bacterial proteomes. Our final objective is to use this platform to better characterize bacterial pathogens in a clinical context, but a major requirement to achieve this goal is to build up accurate bacterial proteoform databases.  

Project status : Closed

JASS: an online tool for the joint analysis of GWAS summary statistics

In recent years, large genome-wide association studies (GWAS) have been successful in identifying thousands of significant genetic associations for multiple traits and diseases1. In the course of this endeavor, sample size has proven to be the key factor for identifying new variants. For example, GWAS of body mass index (BMI), now including up to 350,000 individuals from more than 100 cohorts, have been able to identify genetic variant that explain as low as 0.02% of BMI variance2. While standard approaches for detecting new genetic variants associated with traits and diseases will go on as sample size increases, multivariate analyses have been proposed as an alternative strategy for both improving detection of new variants and exploring the multidimensional components of complex traits and diseases. Intuitively, multivariate analysis can be used to improve detection of variants displaying a pleiotropic effect3 by accumulating moderate evidence of association across multiple traits and diseases. Several recent examples have been published about not only GWAS hit overlap across related traits4, but also of genome-wide shared genetic effect5. Multivariate analyses of GWAS have also proven useful to understand shared genetics between diseases5, and potential causal relationship between phenotypes using Mendelian randomization (MR)6. Importantly, most of existing multivariate methods are based on GWAS summary statistics, while approaches based on individual-level data have been seldom considered because of major practical and ethical issues. In the continuity of ongoing work on multi-phenotype analysis (Aschard et al 20147, Aschard et al 20158), we developed an effective and robust multivariate approach of GWAS summary statistics that addresses the major barriers of existing approaches, i.e. the presence of correlation between studies that would exists when GWAS analyzed share sample9-16. Our approach consists in a robust omnibus multivariate test of GWAS summary statis

Project status : Closed

Build a software to decipher Gephyrin alternative transcripts obtained with long read sequencing

Disruption of GABAergic inhibitory circuits is one of the common alteration responsible for several psychiatric developmental disorders. Gephyrin (GPHN) is the common and main molecular organizer of inhibitory synapses. It acts as a hub under the postsynaptic membrane for the multiple protein-protein interactions. Intriguingly, inhibitory synapses are highly heterogeneous, bearing various inhibitory postsynaptic potential (IPSP) properties and also specific subcellular localization on their target neuron. The molecular mechanism responsible of this diversity is still unknown although it could result, in part, of alternative splicing regulation that will produce specific GPHN isoforms carrying versatile properties. Interestingly, exons alternatively included in Gphn transcripts are proposed to change the binding of GPHN protein with inhibitory receptor as well as its oligomerization. Thus, alternative splicing regulation of Gphn expression intuitively provides a potential molecular mechanism to finely regulate several aspect of inhibitory synapse development, however this regulation step is still largely unexplored. In collaboration with Fabrice Ango (IGF-Montpellier), we have designed an experimental approach to sequence GPHN transcripts using the technologies from Pacific Bioscience and Oxford Nanopore. It was applied to samples prepared from mouse and human tissues. To date, we got sequences from Pacific Bioscience sequencing and our interaction with E. Kornobis allow us to get preliminary data that have revealed a high level of complexity in alternative transcripts expressed by GPHN in Mouse brain samples. However, we are fighting to cluster these sequences and pulling together alternative GPHN transcripts with a bioinformatic pipeline able to decipher properly between light variation of sequences and sequencing errors associated to long read sequencing. Results obtained from currently available solutions, such as PacBio IsoSeq3 analysis pipeline, led us to believe that a more suitable software solution is needed, especially to properly characterize Gephyrin splicing diversity. We propose to build a new bioinformatic pipeline to analyze our data and usable to long read sequencing obtained with Pacific Bioscience and Oxford Nanopore technologies.

Project status : Closed