Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Phylogenetics
Related people (4)
| work as a research engineer in the ßioinƒormatics and ßiostatistics HUß of the |nstitut Pasteur. Holder of a PhD in bioinƒormatics, my main interest is on ƒast but robust phylogenetic inƒerence algorithms and methods ƒrom large genome-scaled datasets. |n consequence, | am oƒten involved in related bioinƒormatics projects, such as perƒorming de novo or ab initio genome assemblies, designing and processing core genome †yping schemes, building and analysing phylogenomics datasets, or implementing and distributing novel tools and methods.
AlgorithmicsClusteringGenome assemblyGenomicsGenotypingPhylogeneticsTaxonomyGenome analysisProgram developmentEvolutionSequence homology analysis
- Long-term preservation of a Campylobacter fetus strain and genomic stability(Dominique CLERMONT - Collection of the Institut Pasteur (CIP)) - In Progress
- DNMT and RNMT in Leishmania(Gerald SPAETH - Molecular Parasitology and Signaling) - In Progress
- Séquençage à haut débit (NGS) et traitement de séquences ADN des domaines variables d’anticorps simple chaine d’alpaga (domaines VHH ou Nanobodies®)(Margarida GOMES - Antibody Engineering) - In Progress
After a PhD in Microbiology on bacterial toxin-antitoxin systems at the Free University of Brussels, I joined the Institut Pasteur for a 3 years postdoc in Eduardo Rocha’s lab. During this period, I performed comparative genomics and pylogenetic analysis on bacterial conjugation and type IV secretion systems. Then, I worked 2 years in Olivier Tenaillon’s team on the modelling and evolution of organismal complexity. I joined the HUB in 2015, and I am involved in phylogenetic and comparative genomics projects.
GenomicsPhylogeneticsSequence analysisGenome analysisGeneticsEvolutionPopulation genetics
- Evaluation of the mutation rate per site and dN/dS in the genomes of Yersinia enterocolitica(Cyril SAVIN - Yersinia) - Awaiting Publication
- Phylogenetic analysis of HHD-PDZ containing proteins(Nicolas WOLFF - Channel Receptors) - Awaiting Publication
- Phylogenetic analysis of insect-specific flaviviruses(Artem BAIDALIUK - Insect-Virus Interactions) - Closed
After a Master degree in bioinformatics and biostatistics, I did a PhD in computer science / bioinformatics at University Paris-Sud (now in University Paris-Saclay), where I worked on integration and analysis of comparative genomics data. After a postdoc in Lausanne, Switzerland where I worked on small-RNA sequencing data, I joined GenoSplice where I was responsible for the development of bioinformatics projects related to next generation sequencing. I joined Institut Pasteur in Nov. 2015, to work in the Evolutionary Bioinformatics Unit and participate in the development of new tools and algorithms that are able to tackle efficiently the ever increasing amount of sequencing data.
AlgorithmicsData managementPhylogeneticsSequence analysisDatabaseGenome analysisProgram developmentScientific computingDatabases and ontologiesSequencingWorkflow and pipeline development
A computer scientist by training, I am applying this knowledge to solve biological problems and am particularly interested in modelling of biological systems, knowledge inference, ontologies and data visualisation.
AlgorithmicsData VisualizationMetabolomicsModelingPathway AnalysisPhylogeneticsSystems BiologyTool DevelopmentDatabaseProgram developmentScientific computingDatabases and ontologiesApplication of mathematics in sciencesSofware development and engineeringData and text miningEvolutionData integrationGraph theory and analysisWorkflow and pipeline developmentDiscrete and numerical optimization
VirusHuman Immunodeficiency virus (HIV)
- Modeling mitochondrial metabolism dormant Cryptococcus neoformans(Benjamin HOMMEL - Molecular Mycology) - Closed
- Measles virus protein C interplay with cellular apoptotic pathways; applications for cancer treatment(Alice MEIGNIÉ - Viral Genomics and Vaccination) - Closed
- Diffusion des mutations de résistance du VIH : modèles et méthodes d’estimation(Olivier GASCUEL - Evolutionary Bioinformatics) - In Progress
Related projects (27)
The phylogenetic position and status of “giant viruses”, formerly called NucleoCytoplasmic Large DNA viruses (NCLDV) or putative order Megavirales, are controversial. Many preliminary phylogenetic analyses have been published, but their presentations are usually highly biased by the prejudice of the authors concerning the nature of giant viruses. Our own preliminary analyses suggest that giant viruses are indeed ancient (they predate the last universal eukaryotic ancestor) and have possibly provided important functions to emerging eukaryotic cells (e.g. DNA topoisomerase activities). The number of giant virus genomes has recently dramatically increased, opening new opportunity to study their position in the “universal tree of life” and their evolutionary relationships with eukaryotes. The aim of the project is to perform an exhaustive phylogenetic analysis of all giant virus proteins with eukaryotic (archaeal/bacterial) homologues to (i) test the monophyly of giant viruses, (ii) determine their contribution to early eukaryotic evolution, iii) determine if some giant virus proteins can be useful to root the eukaryotic tree. We need the help of a bioinformatics colleague with good expertise in building phylogenetic trees from large data sets using different methods of tree construction and robustness evaluation. This work will be complemented by the systematic search for significant indels (insertion/deletion) in the alignments obtained by two members of the BMGE team (Patrick Forterre and Morgan Gaia).
Candida albicans is responsible for the majority of life-threatening fungal infections occurring in hospitalized patients and is also the most frequently isolated fungal commensal of humans. The C. albicans population includes at least 18 phylogenetic groups (or clades). Specific phenotypes can distinguish isolates within a given clade from those in other clades and yet, the relationships between C. albicans natural genetic and phenotypic diversities have not been explored in depth. We have sequenced the diploid genomes of >150 C. albicans isolates selected from a collection of commensal/clinical isolates previously used to characterize the population structure and belonging to the 12 major C. albicans clades. The aim of this project is to develop the tools necessary for an in depth analysis of these genome sequences in order to allow us ask questions about the extent of C. albicans genetic diversity, the contribution of loss-of-heterozygosity to this diversity, and the history of C. albicans population.
Across bacterial, archaeal and eukaryotic kingdoms, heat shock proteins (HSPs) are defined as a class of highly conserved chaperone proteins that are rapidly induced in response to temperature increase through dedicated heat shock transcription factors. While this transcriptional response governs cellular adaptation of fungal, plant and animal cells to thermic shock and other forms of stress, early-branching eukaryotes of the kinetoplastid order, including trypanosomatid parasites, lack classical mechanisms of transcriptional regulation and show largely constitutive expression of HSPs, thus raising important questions on the function of HSPs in the absence of stress and the regulation of their chaperone activity in response to environmental adversity. Understanding parasite-specific mechanisms of stress-response regulation is especially relevant for protozoan parasites of the genus Leishmania that are adapted for survival inside highly toxic phagolysosomes of host macrophages causing the various immuno-pathologies of leishmaniasis. To gain first insight into the role the heat shock repsonse for Leishmania differentiation and pathogenicity, we are studying the evolution and function of members of the HSP70 protein family combining bio-informatics and transgenics apporahces.
Whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by the above laboratories to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this
We wish to investigate the extent of conservation and prevalence of small orfs of unknown function expressed in stationary phase of growth in Salmonella
DNA topoisomerase IB (Topo IB) enzymes are ubiquitous in eukaryotes, where they represent the major DNA topoisomerase I activity. However, Topo IB sequences are also found in other phyla, such as archaea and bacteria, as well as viruses. Given the large amount of sequenced data available in public databases, this project aims to infer a robust Topo IB gene tree based on a representative set of homologous sequences gathered from a large taxonomic sample.
Regulation by phase variation and attenuation: looking for leader peptides containing repeats in the intergenic regions of streptococcal genomes
We have described recently a novel mechanism of regulation combining phase variation and attenuation of two pilus operons in Streptococcus gallolyticus . Phase variation occurs by single-strand mispairing during replication due to the presence of repeats in the leader peptide located immediately upstream of the pilus operon. We wonder if the same mechanism is applicable to other bacterial genes.
Pasteur International Bioresources Network (PIBnet) bioinformatics: whole-genome sequencing of microbial agents for disease surveillance, outbreak investigation, epidemiology and population biology
The PIBnet initiative is a joint effort by 15 National Reference Centers (NRC), 8 Collaborative Centers of World Health Organization, the Collection de l’Institut Pasteur & Cyanobacteria collection and the CIBU to modernize their activities, including collection management and microbial characterization approaches and technologies. Within this large concerted effort, a priority is to promote whole genome sequencing (WGS) as the major characterization approach of microbial agents for surveillance and outbreak investigation. Our ambition is to have shifted to WGS as a routine strain characterization method for epidemiological surveillance and outbreak investigation in the Institut Pasteur at the end of 2016. The target volume is 10,000 genomes a year. On the bioinformatics level, this requires implementing fast data treatment tools, databases, genotyping schemes and methods to extract medically relevant information from genomic sequences (resistome, virulome).
L'Analyses phylogénétique par séquence protéique, temporal et géographique des échantillons grippales avant et après l’épidémie de 2009.
The genome of C. tetani contains a chromosome of approximately 2,8Mb and a large size plasmid (74 kbp) harboring the tetanus-toxin gene. The genome of the strain E88 was sequenced and annotated (PNAS 2003, 100:1316-1321). We have sequenced and performed a first comparison of the genomes of three additional strains (Res Microbiol 2015, 166:326-331). Fourteen additional C. tetani strains were sequenced including historical strains (1952-1968) and recent French clinical isolates. We have the raw data obtained by Illumina sequencing. Sequence comparaison of chromosome and plasmid will be done. For this, in a first time the assembly of sequence read of each strain will be done, in a second time a comparaison of chromosome and the plasmid of these 14 strains by BLAST approach will be made. Finally, a phylogenetic tree will be generated allowing us to see the evolution of this bacteria.
Background: PLA2 is known to regulate vesicle secretion in diverse eukaryotic cells. We are interested in determining the putative role of PLA2 in secretion of apical vesicular organelles called micronemes and rhoptries in P. falciparum merozoites. PLA2 inhibitors such as 4-BPB are known to block growth of the related Apicomplexan parasite Toxoplasma gondii. There are 3 annotated PLA2 genes in the P. falciparum genome database (Pf3D70209100, PF3D71358000, PF3D70924000). We would like to analyze these putative PLA2 genes using bioinformatics to support our drug development studies.
The first two cases in France of botulism due to Clostridium baratii type F were identified in November 2014, in the same family. Both cases required prolonged respiratory assistance. One of the cases had extremely high toxin serum levels and remained paralysed for two weeks. Investigations strongly supported the hypothesis of a common exposure during a family meal with high level contamination of the source. However, all analyses of leftover food remained negative. Clostridium baratii was isolated from stool specimens from the two patients. A second cluster of three cases of food-borne botulism due to Clostridium baratii type F occurred in France in August 2015. All cases required respiratory assistance. Consumption of a Bolognese sauce at the same restaurant was the likely source of contamination. Clostridium baratii was isolated both from stool specimens from the three patients and ground meat used to prepare the sauce. This is the second episode reported in France caused by this rare pathogen. One strain from the first cluster and four from the second one have been sequenced (Illumina technology). A complete genome of a C. baratii strain (Sullivan) isolated from adult stool in 2007 in New York is available in Genbank (chromosome: CP006905; plasmid: CP006906).
We submitted an article about the diversity of Clostridium botulinum strains based on MLST analysis. One reviewer of the article asked further study of our NGS data.
Les mutations de résistance aux traitements apparaissent sous l’effet de la sélection. Ces mutations sont transmises, les patients pouvant être infectés par des souches déjà résistantes à certains traitements. Ces mutations posent des problèmes considérables en limitant l’arsenal des traitements disponibles, pour les individus comme pour la population sur le long cours. Dans le cas du VIH, nous avons travaillé sur la transmission de ces mutations au sein de la population anglaise, et montré que la majorité de celles-ci étaient transmises par des patients naïfs ignorant leur infection (Mourad et al. AIDS 2015). La méthode employée était souvent ad-hoc et n’utilisait qu’une partie des données disponibles. L’objectif de ce projet est de mettre en place des modèles rigoureux et des méthodes efficaces d’estimation des paramètres (taux de transmission suivant les caractéristiques du donneur, taux de réversion, etc.).
Identification of new or unexpected pathogens, including viruses, bacteria, fungi and parasites associated with acute or progressive diseases
Microbial discovery remains a challenging task for which there are a lot of unmet medical and public health needs. Deep sequencing has profoundly modified this field, which can be summarized in two questions : i) which pathogens or association of pathogens are associated with diseases of unknown etiology and ii) among microbes infecting animal (including arthropod) reservoirs, which ones are able to infect large vertebrates, including humans. We are currently addressing these two questions and our current request comes with the willingness for Institut Pasteur to increase its contribution and visibility of this thematic, in particular in relation with hospitals and the Institut Pasteur International network (IPIN). We expect to identify new microbes associated with human diseases, and this is expected to pave the way for basic research programs focusing on virulence mechanisms and host specificity, and will also lead to phylogenetic and epidemiological studies (frequency of host infection, mode of transmission etc...), as well as the development of improved diagnostic tests for human infections. Our objective is also to contribute to the efforts of Institut Pasteur in the field of infectious diseases, by building a pipeline, from sample to microbial identification, able to manage large cohorts of samples. This project is currently supported by the LABEX IBEID and the CITECH, and critically requires a bioIT support, justifying this application. Partners include different hospitals including Necker-Enfants malades University Hospital regarding patients with progressive disease, different IPIN laboratories, as well as INRA and CIRAD regarding animal/arthropod reservoirs.
This project is intended to explore the novel protein interactions that are involved in the protein folding in cyanobacteria. Slr0286 was previously identified as a protein interacting with the D2 protein of photosystem 2 and affecting its functional assembly and stability in Synechocystis sp. PCC6803. The gene that encodes this protein appears to be in one operon with slr0285 in all strains where these two genes are found. This may suggest the functional relationship between the products of these genes. We would like to explore the question of whether these genes co-evolved and whether there could be an evidence of interaction between Slr0285 and Slr0286.
Samples were collected in the field from infected patients admited in Ebola treatment Center of Macenta (ETC Guinea). Deep sequencing was performed in order to look for majority and minority variants. The aim of this study is to compare quasispecies of patients who survived and those who died and also to get detailed description of EBOV evolution in an ETC to identify phylogenetic clusters or transmission chain.
Genomic determinants for initiation and length of natural antisense transcripts in a compact eukaryotic genome and phylogenetic analysis of related Entamoeba species
Entamoeba histolytica is a protozoan parasite and an amitochondriate pathogenic amoeba, which causes amoebiasis (dysentery and liver abscess) in humans. In addition to E. histolytica several species infect the human intestine although these do not cause disease and include in most of cases E. dispar and ocassionnally E. moshkovskii. A phylogenetically close Entamoeba, E. invadens infecting snails, is used as cellular model for Entamoeba cyst formation.
Supported by the National Agency for Research (ANR-10-GENM-0011) we developed a project to firstly study the transcriptional landscape of pathogenic E. histolytica. Among the results we discovered that 60% of ORFs present anti-sense RNAs (NATs) that map to the 3‘ end of genes. Their regulation is modified upon environmental changes. The regulation of NATs is basically governed by genomic sequences within the very short intragenic region of the amoeba genome. Secondly, we have started to conduct comparative genomics and transcriptomics approaches to understand phenotypic differences between Entamoeba species, in particular with respect to virulence.
Whole genome sequencing is revolutionizing the surveillance of foodborne and waterborne bacterial pathogens. The speed with which public health laboratories obtain information after the onset of symptoms and the regular sharing of this information between public health laboratories and epidemiologists are critical for the successful use of information to detect outbreaks early and to identify their source. For this purpose, this project aims at providing the most relevant bacterial genomic information in a timely-manner by integrating different validated in silico tools (core genome MLST, CRISPR, O and H molecular serotype, ....) into a single automated analysis pipeline.
Membrane fusion is an essential process in all forms of life, and fusion proteins are responsible of catalysing this reaction by forcing the two membranes against each other by undergoing a fusogenic
Antimalarial drug resistance in Africa: A comprehensive molecular analysis of the emergence of artemisinin resistant parasites in Africa
We are involved, in collboration with the WHO, in the SaMARA which aims at detecting the emergence of antimalarial drug resistance in Africa. Samples (dried blood spots) are collected from the Nantio
In this project we study ecological diversification of Klebsiella pneumoniae and closely related species. Using comparative genomics we want to identify the pattern of genome adaptation to different e
We tested DNMT and RNMT inhibitors on Leishmania parasites survival and would like to know which could be the potential targets.
Evaluation of the mutation rate per site and dN/dS of the genomes of 4 Yersinia enterocolitica strains isolated from the same patient during 14 years.