Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Genome assembly
Related people (5)
Emna has joined the C3BI in 2016 and worked actively in the IGDA platform doing research and education. Now, she is also part of the Viral Populations and Pathogenesis Unit (PVP).
Genome assemblySequence analysisProgram developmentData integrationRead mappingLIMSParallel computingGene predictionShotgun metagenomics
I joined the Bioinformatics and Biostatistics Hub at Institut Pasteur in 2016 where I am currently developing pipelines related to NGS for the Biomics Pôle. I have an interdisciplinary research experience: after a PhD in Astronomy (gravitational wave data analysis), I joined several research institute to work in the fields of plant modelling (INRIA, Montpellier, 2008-2011), System Biology — in particular logical modelling (EMBL-EBI Cambridge, U.K., 2011-2015), and drug discovery (Sanger Institute, Cambridge, U.K.), 2015). On a daily basis, I use data analysis and machine learning techniques within high-quality software to tackle scientific problems.
AlgorithmicsData managementData VisualizationGenome assemblyGenomicsMachine learningModelingScientific computingDatabases and ontologiesSofware development and engineeringData and text miningIllumina HiSeqGraph theory and analysisIllumina MiSeq
| work as a research engineer in the ßioinƒormatics and ßiostatistics HUß of the |nstitut Pasteur. Holder of a PhD in bioinƒormatics, my main interest is on ƒast but robust phylogenetic inƒerence algorithms and methods ƒrom large genome-scaled datasets. |n consequence, | am oƒten involved in related bioinƒormatics projects, such as perƒorming de novo or ab initio genome assemblies, designing and processing core genome †yping schemes, building and analysing phylogenomics datasets, or implementing and distributing novel tools and methods.
AlgorithmicsClusteringGenome assemblyGenomicsGenotypingPhylogeneticsTaxonomyGenome analysisProgram developmentEvolutionSequence homology analysis
- Séquençage à haut débit (NGS) et traitement de séquences ADN des domaines variables d’anticorps simple chaine d’alpaga (domaines VHH ou Nanobodies®)(Margarida GOMES - Antibody Engineering) - Pending
- Antimalarial drug resistance in Africa: A comprehensive molecular analysis of the emergence of artemisinin resistant parasites in Africa(Didier MENARD - Biology of Host-parasite Interactions) - In Progress
- Implémentation d’un algorithme rapide de génotypage cgMLST(Valérie BOUCHEZ - Molecular Prevention and Therapy of Human Diseases) - In Progress
After a PhD in informatics on graph analysis (metabolic networks and sRNA-mRNA interaction graphs) at the LaBRI (Université de Bordeaux), I joined the DSIMB team (INTS) for a post-doc on structural modeling. Then, I performed a second post-doc at Metagenopolis – INRA Jouy-en-Josas, where I was initiated to the analysis of metagenomic data. I was recruited at the HUB in 2015, and since I pursue the development of methods dedicated to the treatment of metagenomic data by combining either the treatment of sequencing data, the statistics, the protein structural modeling and the graph analysis.
AlgorithmicsClusteringGenome assemblyGenomicsMetabolomicsModelingNon coding RNASequence analysisStructural bioinformaticsTargeted metagenomicsDatabaseGenome analysisBiostatisticsProgram developmentScientific computingDatabases and ontologiesExploratory data analysisData and text miningIllumina HiSeqComparative metagenomicsRead mappingIllumina MiSeqSequence homology analysisGene predictionMultidimensional data analysisSequencingShotgun metagenomics
- Targeted search of specific commensals in 16S databases(Pamela SCHNUPF - Molecular Microbial Pathogenesis) - In Progress
- Microbiota dysbiosis in human colon cancer(Iradj SOBHANI - Other) - Pending
- Environmental and human surveillance of polioviruses, VDPVs, and other enteroviruses in Madagascar and the impact during the switch from tOPV to bOPV(Patsy POLSTON - Biology of Enteric Viruses) - In Progress
2015 – . – Institut Pasteur, Paris, France – Unit : Bioinformatics and Biostatistics HUB 2012 – 2015 – Institut Pasteur, Paris, France – Unit : Molecular Genetics of Yeasts Supervisor : Prof. B. Dujon 2012 – Institut Pasteur, Paris, France – Unit : Integrated Mycobacterial Pathogenomics Supervisor: Dr. R. Brosch Education 2012– MSc. Bioinformatics – Université Paris Diderot (Paris VII)
Genome assemblySequence analysisGenome analysisOrthology and paralogy analysisRead mappingSequence homology analysisDNA structure analysisGenome rearrangementsMotifs and patterns detection
- Duplications in bacteriophage genomes.(Luisa DE SORDI - Molecular Biology of Gene in Extremophiles) - New
- De novo sequencing and analysis of three unassigned species of non tuberculous mycobacteria.(RIM GHARBI - Integrated Mycobacterial Pathogenomics) - Awaiting Publication
- Differentiation of Shigella species from Escherichia coli by MALDI-TOF mass spectrometry(Sophie LEFÈVRE - Enteric Bacterial Pathogens) - In Progress
Related projects (16)
Candida albicans is responsible for the majority of life-threatening fungal infections occurring in hospitalized patients and is also the most frequently isolated fungal commensal of humans. The C. albicans population includes at least 18 phylogenetic groups (or clades). Specific phenotypes can distinguish isolates within a given clade from those in other clades and yet, the relationships between C. albicans natural genetic and phenotypic diversities have not been explored in depth. We have sequenced the diploid genomes of >150 C. albicans isolates selected from a collection of commensal/clinical isolates previously used to characterize the population structure and belonging to the 12 major C. albicans clades. The aim of this project is to develop the tools necessary for an in depth analysis of these genome sequences in order to allow us ask questions about the extent of C. albicans genetic diversity, the contribution of loss-of-heterozygosity to this diversity, and the history of C. albicans population.
Candida albicans is responsible for the majority of life-threatening fungal infections occurring in hospitalized patients and is also the most frequently isolated fungal commensal of humans. Microevolution of C. albicans isolates has been observed in a number of instances, being in particular characterized by loss-of-heterozygosity events. Yet, most studies that have investigated such microevolutions have not used whole-genome sequencing. In this project, we aim to characterize C. albicans microevolution at the genome-wide level. To this aim, we will take advantage of multiple isolates collected at the same time in healthy individuals and that share the same molecular type, thus providing information on the extent of genetic diversity of commensal isolates. We will also take advantage of series of isolates collected in patients with different forms of candidiasis and/or that have received antifungal therapy, thus providing information of the impact of pathogenic interaction and antifungal treatment on genome dynamics.
The 2013-2015 Ebola virus disease epidemic is the largest outbreak so far described with 27 305 cases and 11 169 deaths. The virus spread by human to human contact throughout Western Africa and never before has a variant been transmitted for such a sustained period of time. Ebola virus are RNA virus so as other RNA viruses they could accumulate mutations during evolution. Therefore it is an emergency to monitor viral changes and adaptation within and between individuals in order to help researchers to better understand susceptibility to Ebola infections, to guide research on therapeutic targets and to ensure accurate diagnosis. New technologies can provide information about pathogen’s evolution and in our lab we have access to an Ion PGMTM sequencer. Thanks to the national reference center for viral hemorrhagic fever (VHF) we have at our disposal a large number of samples collected from Ebola infected patients especially from Guinea. We have developed an Amplicon approach using sixteen couples of specific primers for Ebola viruses and a RNA sequencing method based on randomly primed cDNA synthesis to product our libraries. Ion PGMTM Hi-Q sequencing kit will be used to sequence up to 400 bp inserts loaded onto 316v2TM or 318v2TM chip. Through high depth sequencing we would like to follow up the profiling of intra and inter host viral quasispecies at different time of the epidemic in the geographic area of the Ebola Treatment Centre in Macenta. Thanks to the activities of national reference center for VHF and the Biomics Pole one aim of the project is also to occasionally compare viral quasispecies and consensus sequences between patients who get uncommon symptoms from those who get classical illness and to study intra host quasispecies in different biological fluids (cerebrospinal fluid, sperm, urine) to see if there are differences between persistent species and viral quasispecies found during symptomatic step.
Whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by the above laboratories to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this large concerted effort, a priority is to promote WGS as the major characterization approach of microbial agents for surveillance and outbreak investigation. Our ambition is to have shifted to WGS as a routine strain characterization method for epidemiological surveillance and outbreak investigation in the Institut Pasteur NRCs at the end of 2016. The target volume is 10 000 genomes a year. On the bioinformatics level, this requires implementing (1) fast data treatment tools and (2) Genotyping/classification schemes and methods to extract medically relevant information from genomic sequences (resistome, virulome).
Assemblage de novo d'une souche bactérienne afin de pourvoir la soumettre dans une plateforme d'annotation.
How ribosomal protein gene position impacts in the genome evolution during a long term evolution experiment.
Increasing evidence indicates that nucleoid spatiotemporal organization is crucial for bacterial physiology since these microorganism lack a compartmentalized nucleus. However, it is still unclear how gene order within the chromosome can influence cell physiology. In silico approaches have shown that genes involved in transcription and translation processes, in particular ribosomal protein (RP) genes, tend to be located near the replication origin (oriC) in fast-growing bacteria suggesting that such a positional bias might be an evolutionarily conserved growth-optimization strategy. Recently we systematically relocated a locus containing half of ribosomal protein genes (S10) to different genomic positions in Vibrio cholerae. These experiments revealed drastic differences in growth rate and infectivity within this isogenic strain set. We showed that genomic positioning of ribosomal protein genes is crucial for physiology by providing replication-dependent higher dosage in fast growing conditions. Therefore it might play a key role in genome evolution of bacterial species. We aim at observing how the genomic positioning of these genes would influence the evolution of Vibrio cholerae. To gain insight into the evolutionary consequences of relocating RP genes, we let evolve either the wild type or the most affected strains for 1000 generations in fast-growing conditions. NGS will be performed and analyzedon the evolved populations to understand the genetic changes responsible of the observed phenotypic changes.
Dengue prevention relies primarily on controlling populations of the main mosquito vector, Aedes aegypti, which is failing in many parts of the world because of the lack of sustained commitment of resources and ineffective implementation. Novel entomological approaches to dengue control are being developed that aim at replacing or suppressing mosquito vector populations. Insufficient genomic resources for Ae. aegypti, however, have until now impeded progress in both basic and applied research on this medically important mosquito species. The only available reference genome for Ae. aegypti is a draft that consists of over 4,800 unassembled fragments with incomplete annotation. Moreover, the inbred Ae. aegypti laboratory strain that was sequenced does not universally represent the considerable genetic and ecological diversity of the species worldwide. The large size of the genome and its high content in repeat-rich sequences of transposable elements was a major difficulty to assemble the Ae. aegypti genome sequence. In the present project, we aim to overcome this difficulty using a novel strategy for genome sequencing and assembly. The ultimate goal is to produce several, fully assembled, well-annotated, new Ae. aegypti reference genomes from epidemiologically relevant populations. The expected outcome is a genome reference panel including a catalog of species-wide genetic variation that will significantly improve genomic resources for Ae. aegypti research and help address a broad range of biological questions related to Ae. aegypti vectorial capacity and dengue virus transmission.
Pasteur International Bioresources Network (PIBnet) bioinformatics: whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by 15 National Reference Centers (NRC), 8 Collaborative Centers of World Health Organization, the Collection de l’Institut Pasteur & Cyanobacteria collection and the CIBU to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this large concerted effort, a priority is to promote whole genome sequencing (WGS) as the major characterization approach of microbial agents for surveillance and outbreak investigation. Our ambition is to have shifted to WGS as a routine strain characterization method for epidemiological surveillance and outbreak investigation in the Institut Pasteur at the end of 2016. The target volume is 10,000 genomes a year. On the bioinformatics level, this requires implementing fast data treatment tools, databases, genotyping schemes and methods to extract medically relevant information from genomic sequences (resistome, virulome).
A major program of evolutionary and comparative genomics of yeasts has been in progress in my laboratory for many years (see publications). In the next few months (before summer 2015) I need to finish a few comparisons about a new clade to publish as soon as possible.
Horizontal gene transfer (HGT) is a major driving force of bacterial diversification. For mycobacteria, a special type of HGT was described in Mycobacterium smegmatis which is linked to distributive conjugal transfer (Gray et al., PLoS Biology, 2013). In the current project we are trying to reproduce the results and explore the process.
We're interested in the production of a toxin by Klebsiella pneumoniae. We have a mutant unable to produce this toxin, and would like to identify the mutation responsible of this inability.
Dissecting the peptidoglycan trafficking machinery using gene trap mutagenesis in near-haploid human cells
It has been found in the past that the peptidoglycan (PGN) degradation fragments DAP-containing muramyl tripeptide (M-triDAP) and muramyl di-peptide (MDP) stimulate innate immune receptors Nod1 and Nod2. However, it remains to be clarified how the fragments reach Nod1 and Nod2, since these receptors are intracellular. The aim of the project is thus to investigate novel factors in peptidoglycan signalling, using gene trap mutagenesis in human near-haploid cells to randomly knock out genes and do a genome wide screen to establish a library. There is a pressing need to find novel therapeutic strategies, and it has been shown in the past that the above described technique of finding genes works well for this (see Carrette et al., 2011). Results will be validated in relevant mouse models and epithelial cell lines using CRISPR-Cas.
Escherichia coli is one of the major bacterial pathogens that are responsible for numerous nosocomial infections. While most of the E. coli infections are rather related to colonisation of the urinary
Trichosporon asahii is a yeast responsible of human invasive infection worldwide. Actually, no genotyping method is available to determine relationship between clinical isolates. At the NRCMA we have more than 40 clinical isolates and 2 collection strains associated with clinical data. Thanks to P2M facility, whole genome for 33 isolates was sequenced. The aim of this project is to study the genetic diversity of Trichosporon asahii and the potential relationship with clinical and/or phenotypic data and finally propose a new genotyping method that could be usefull for clinician in case of local or national outbreak.
Nous souhaitons analyser les séquences de sept mutant de Mycobacterium marinum générées par l’utilisation de concentrations croissante d’un antibiotique candidat dont nous ne connaissons pas la cible.