Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Sequence analysis
Related people (18)
Emna has joined the C3BI in 2016 and worked actively in the IGDA platform doing research and education. Now, she is also part of the Viral Populations and Pathogenesis Unit (PVP).
Genome assemblySequence analysisProgram developmentData integrationRead mappingLIMSParallel computingGene predictionShotgun metagenomics
Developing and evaluating bioinformatic tools for: – next generation sequencing data – genome analysis & comparison Specialties:Genome & Transcriptome Bioinformatics
Data managementData VisualizationGenomicsNon coding RNASequence analysisTranscriptomicsGenome analysisBiostatisticsProgram developmentScientific computingData and text miningBiosensors and biomarkersEpidemiology and public health
- Identification of non-coding RNAs under the control of the PerR regulators(Nadia BENAROUDJ - Biology of Spirochetes) - Closed
- Tissue-resident stromal cell heterogeneity(Lucie PEDUTO - Stroma, Inflammation and Tissue Repair) - Closed
- Role of small non coding RNAs in the adaptive response to oxidative stress in pathogenic Leptospira(NADIA BENAROUDJ - Biology of Spirochetes) - Closed
As a computational biologist I have been involved in various projects seeking to answer different biological questions. Those projects have allowed me to define my main research interest, namely the evolutionary study of the emergence, storage and modulation of information in biological systems assisted by computational methods. During my research career I have acquired extensive experience in the analysis of sequence data at the DNA and protein level. I’m trained both in NGS bioinformatic protocols (ChIP-seq, ATAC-seq, RNA-seq, genome assembly) and fine detail sequence analysis. Most importantly, I have gained proficiency in the use of the statistical models that are at the basis of the quantitative analysis of low and high throughput sequence data. Additionally, my experience as a lecturer and instructor has taught me that training researchers about the formal basis of bioinformatic methodologies is the key for a successful collaboration between wet and dry lab. Likewise, I have gained valuable skills by working within two international consortia (TARA Oceans project and TRANSNET): the ability to collaborate with multidisciplinary groups and to coordinate younger researchers.
AlgorithmicsGenomicsSequence analysisTranscriptomicsGenome analysisGeneticsEvolutionInteractomics
- fliC locus of Y. pestis(Mara CARLONI - Yersinia) - In Progress
- Mechanisms defining functional heterogeneity of anatomically distinct myogenic populations: insights from single nuclei-ATAC-seq data(Glenda COMAI - Department of Developmental and Stem Cell Biology) - In Progress
- basic alignment/visualisation pipeline(Pablo NAVARRO - Epigenetics of Stem Cells) - In Progress
Professional Experience Today – Institut Pasteur – HUB Team 2009 – today – Institut Pasteur – Bioinformatician 2006 – 2009 – CNRS, Orsay – Institut Génétique et Microbiologie – PostDoc 2002 – 2006 – INRIA, Grenoble – Ph.D 2000 – 2002 – INRIA, Action Helix, Grenoble – Expert engineer 1999 – 2000 – Infobiogen – Université EVE, Evry – Engineer Education 2002 – 2006 Thesis Paris VI, INRIA, Grenoble. 1999 – 2000 DESS Informatique Appliquée à la Biologie, UPMC 1997 – 1998 Maîtrise Biologie cellulaire et Physiologie animale, UPMC
Data managementSequence analysisStructural bioinformaticsDatabaseProgram developmentScientific computingLIMS
- Common and phylogenetically widespread coding for peptides by bacterial small RNAs – Follow up of a project regarding its journal review(Benno SCHWIKOWSKI - Systems Biology) - Closed
- A novel MacSyFinder module for detection of bacterial capsule systems on the future Galaxy platform.(Eduardo ROCHA - Microbial Evolutionary Genomics) - Closed
- Development of a web application and new functionalities for the maintenance and curation of iPPI-DB(Olivier SPERANDIO - Center for Innovation and Technological Research) - Closed
After a PhD in informatics on graph analysis (metabolic networks and sRNA-mRNA interaction graphs) at the LaBRI (Université de Bordeaux), I joined the DSIMB team (INTS) for a post-doc on structural modeling. Then, I performed a second post-doc at Metagenopolis – INRA Jouy-en-Josas, where I was initiated to the analysis of metagenomic data. I was recruited at the HUB in 2015, and since I pursue the development of methods dedicated to the treatment of metagenomic data by combining either the treatment of sequencing data, the statistics, the protein structural modeling and the graph analysis.
AlgorithmicsClusteringGenome assemblyGenomicsMetabolomicsModelingNon coding RNASequence analysisStructural bioinformaticsTargeted metagenomicsDatabaseGenome analysisBiostatisticsProgram developmentScientific computingDatabases and ontologiesExploratory data analysisData and text miningIllumina HiSeqComparative metagenomicsRead mappingIllumina MiSeqSequence homology analysisGene predictionMultidimensional data analysisSequencingShotgun metagenomics
- Evaluation of a novel mouse model for Primary Antibody Deficiency (PAD)(Lise HUNAULT - Antibodies in Therapy and Pathology) - In Progress
- Measles virus type 1 infection disturbs the mitochondrial network leading to type I interferon production through the RNA polymerase III/RIG-I pathway(Jean-Pierre VARTANIAN - Department of Virology) - Pending
- Comparative analysis of choanoflagellate proteomic data(Thibaut BRUNET - Other) - Closed
After a PhD in Microbiology on bacterial toxin-antitoxin systems at the Free University of Brussels, I joined the Institut Pasteur for a 3 years postdoc in Eduardo Rocha’s lab. During this period, I performed comparative genomics and pylogenetic analysis on bacterial conjugation and type IV secretion systems. Then, I worked 2 years in Olivier Tenaillon’s team on the modelling and evolution of organismal complexity. I joined the HUB in 2015, and I am involved in phylogenetic and comparative genomics projects.
GenomicsPhylogeneticsSequence analysisGenome analysisGeneticsEvolutionPopulation genetics
- Centrosome and basal body function in human parasites(Philippe BASTIN - Trypanosome Cell Biology) - New
- Evaluation of the mutation rate per site and dN/dS in the genomes of Yersinia enterocolitica(Cyril SAVIN - Yersinia) - Awaiting Publication
- Phylogenetic analysis of HHD-PDZ containing proteins(Nicolas WOLFF - Channel Receptors) - Closed
After a Master degree in Genetics at Magistère Européen de Génétique, Paris Diderot, I did a second Master in bioinformatics at University of Nantes where I focused my work on the study of mapping strategy for allele specific analysis at the bioinformatics platform of Institut Curie. I then joined Institut Pasteur to work on an ELIXIR project related to the bio.tools registry through the development of a dedicated tool and the participation of several workshops and hackathons. As an engineer of the bioinformatics and Biostatistics Hub, I am involved in several projects from Differential Analysis of RNA-seq data to Metagenomics. I am also in charge of the maintenance of the Galaxy Pasteur instance.
ChIP-seqEpigenomicsGenomicsSequence analysisProgram developmentDatabases and ontologiesSofware development and engineeringGeneticsData integrationRead mappingWorkflow and pipeline developmentConfocal Microscopy
- Impact of gut microbiota on lipid metabolism(Grégoire CHEVALIER - Microenvironment and Immunity) - Closed
- Analysis of IFITM RNA levels in vraious cell types and tissues(Olivier SCHWARTZ - Virus and Immunity) - Closed
- Channels in metagenomics data(Delarue MARC - Structural Dynamics of Macromolecules) - Closed + 1 project
Bernd Jagla received his PhD in bioinformatics (department of Biology, Chemistry, and Parmacy) from the Free University in Berlin, Germany in 1999. Before joining the Institut Pasteur, he worked for almost ten years in New York City, including as an associate research scientist in the Joint Centers for System Biology (Columbia University) and at the Columbia University Screening Center led by Dr J.E. Rothman. He joined the Institut Pasteur in 2009 to take charge of the bioinformatic needs at the Transcriptome et Epigenome platform, focusing on Next Generation Sequencing. As of 2016 he is member of the C3BI – HUB Team detached to the Human immunology center (CIH) and provides support for cytometry, next generation sequencing, and microarray data analysis. His areas of interest include the quality assurance and data analysis and visualization at the facility. He also has strong expertise in developing algorithms for function prediction from sequence data, image analysis, analysis of mass spectrometry data, workflow management systems. While at Pasteur he developed: KNIME extensions for Next Generation Sequencing (Link) Post Alignment Visualization and Characterization of High-Throughput Sequencing Experiments (Link) Post Alignment statistics of Illumina reads (Link)
AlgorithmicsChIP-seqData managementData VisualizationImage analysisMachine learningSequence analysisDatabaseGenome analysisBiostatisticsProgram developmentScientific computingData and text miningIllumina HiSeqGraphics and Image ProcessingIllumina MiSeqHigh Throughput ScreeningFlow cytometry/cell sortingPac Bio
- Identifying new population(s) of NK cells involved in memory to bacterial infection(Melanie HAMON - Chromatin and Infection) - In Progress
- A long-term mission for an assigned CIH-embedded bioinformatician to provide bioinformatic support to the CIH community(Milena HASAN - Department of Immunology) - In Progress
2015 – . – Institut Pasteur, Paris, France – Unit : Bioinformatics and Biostatistics HUB 2012 – 2015 – Institut Pasteur, Paris, France – Unit : Molecular Genetics of Yeasts Supervisor : Prof. B. Dujon 2012 – Institut Pasteur, Paris, France – Unit : Integrated Mycobacterial Pathogenomics Supervisor: Dr. R. Brosch Education 2012– MSc. Bioinformatics – Université Paris Diderot (Paris VII)
Genome assemblySequence analysisGenome analysisOrthology and paralogy analysisRead mappingSequence homology analysisDNA structure analysisGenome rearrangementsMotifs and patterns detection
- Comparative genomics of Listeria monocytogenes isolates(Marc LECUIT - Biology of Infection) - Awaiting Publication
- Duplications in bacteriophage genomes.(Luisa DE SORDI - Molecular Biology of Gene in Extremophiles) - Closed
- De novo sequencing and analysis of three unassigned species of non tuberculous mycobacteria.(RIM GHARBI - Integrated Mycobacterial Pathogenomics) - Closed
After a PhD in Biology in 2011 on population genetics and phylogeography on amazing little amphipods (Crangonyx, Crymostygius) at the University of Reykjavik (Iceland), I pursued my interest in Bioinformatics and Evolutionary Biology in various post-docs in Spain (MNCN Madrid, UB Barcelona). During this time, I investigated transcriptomic landscapes for various non-model species (groups Conus, Junco and Caecilians) using de novo assemblies and participated in the development of TRUFA, a web platform for de novo RNA-seq analysis. In July 2016, I integrated the Revive Consortium and the Epigenetic Regulation unit at Pasteur Institute, where my main focus were transcriptomic and epigenetic analyses on various thematics using short and long reads technologies, with a special interest in alternative splicing events detection. I joined the Bioinformatics and Biostatistics Hub in January 2018. My latest interests are long reads technologies, alternative splicing and achieving reproducibility in Bioinformatics using workflow managers, container technologies and literate programming.
Data managementData VisualizationSequence analysisTranscriptomicsWeb developmentGenome analysisProgram developmentExploratory data analysisSofware development and engineeringGeneticsEvolutionRead mappingWorkflow and pipeline developmentPopulation geneticsMotifs and patterns detectionGrid and cloud computing
HumanInsect or arthropodOther animalAnopheles gambiae (African malaria mosquito)Mouse
- Build a software to decipher Gephyrin alternative transcripts obtained with long read sequencing(allemand ERIC - Epigenetic Regulation) - Closed
- Transcriptomics of Anopheles – Plasmodium vivax interactions towards identification of malaria transmission blocking targets(Catherine BOURGOUIN - Functional Genetics of Infectious Diseases) - Closed
- Mapping of Enhancers from transcriptome data(Christian MUCHARDT - Epigenetic Regulation) - Closed
After a Master degree in bioinformatics and biostatistics, I did a PhD in computer science / bioinformatics at University Paris-Sud (now in University Paris-Saclay), where I worked on integration and analysis of comparative genomics data. After a postdoc in Lausanne, Switzerland where I worked on small-RNA sequencing data, I joined GenoSplice where I was responsible for the development of bioinformatics projects related to next generation sequencing. I joined Institut Pasteur in Nov. 2015, to work in the Evolutionary Bioinformatics Unit and participate in the development of new tools and algorithms that are able to tackle efficiently the ever increasing amount of sequencing data.
AlgorithmicsData managementPhylogeneticsSequence analysisDatabaseGenome analysisProgram developmentScientific computingDatabases and ontologiesSequencingWorkflow and pipeline development
After a PhD in biochemistry of the rapeseed proteins, during which I developed my first automated scripts for handling data processing and analysis, I join Danone research facility center for developing multivariate models for the prediction of milk protein composition using infrared spectrometry.
As I was already developing my own informatics tools, I decided to join the course of informatic for biology of the Institut Pasteur in 2007. At the end of the course I was recruited by the Institute and integrate the unit of “génétique des interactions macromoléculaires” of Alain Jacquier. Within this group, I learn to handle sequencing data and I developed processing and analysis tools using python and R. I also create a genome browser and database system for storing, retrieving and visualizing microarray data. After 8 years within the Alain Jacquier’s lab, I join the Hub of bioinformatics and biostatistics as co-head of the team.
ClusteringData managementSequence analysisTranscriptomicsWeb developmentDatabaseGenome analysisProgram developmentScientific computingExploratory data analysisData and text miningIllumina HiSeqRead mappingLIMSIllumina MiSeqHigh Throughput ScreeningMultidimensional data analysisWorkflow and pipeline developmentRibosome profilingMotifs and patterns detection
- SHERLOCK4HAT - WP1.1(Brice ROTUREAU - Group: Trypanosome transmission) - Closed
- Remettre les servers Genolist comme LegioList, TuberclListe, Colibri etc en service(Carmen BUCHRIESER - Biology Of Intracellular Bacteria) - Closed
- Identification of eukaryotic 5'UTRs(Arnaud ECHARD - Membrane Traffic and Cell Division) - Closed
Professional Experience Today - Institut Pasteur,Paris - HUB Team 2017 - Bioinformatician 2001 - 2017 - Institut Pasteur,Paris; CIB/DSI - Engineer 1997 - 2000 Thesis: NMR and molecular modelisation, CEA, Saclay,
Data managementSequence analysisTranscriptomicsGenome analysisProgram developmentScientific computing
FungiCandida albicansCryptococcus gattiiCryptococcus neoformans
- Viral metagenomic in noctule bats from East Europe(Laurent DACHEUX - Lyssavirus Dynamics and Host Adaptation) - In Progress
- Viral metagenomic in Chinese bats and their associated ectoparasites.(Laurent DACHEUX - Lyssavirus Dynamics and Host Adaptation) - In Progress
- Characterization of Salmonella mutants(FRANCOISE NOREL - Biochemistry of Macromolecular Interactions) - In Progress
Graduated in “Structural Genomics and Bioinformatics”, I mainly worked during almost 6 years at the Genoscope (CEA) in the LABGeM team, within the microbial annotation platform MicroScope. I specifically focused on functional annotation and microbial metabolic pathways prediction and reconstruction, through pipeline implementation, database modeling and web interface development. Broadly, interactions in the MicroScope platform allowed me to tackle the whole annotation process: from genome assembly and gene prediction to network reconstruction. I also performed several comparative genomics analyses. As a member of the “Hub team”, I now take part to various projects, linked to HTS data, on different subjects (lncRNAs and stem cells, HIV integration and DNA structure, Ribosomal protein genes and genome evolution, Natural Antisense Transcripts in compact genomes…).
Data managementGenomicsSequence analysisWeb developmentDatabaseGenome analysisDatabases and ontologiesOrthology and paralogy analysisRead mappingSequence homology analysisGene prediction
- Setup of bioinformatic pipelines for paleo(meta)genomics(Nicolás RASCOVAN - Department of Genomes and Genetics) - In Progress
- Multiparametric immunophenotyping of whole blood in IFN-treated multiple sclerosis patients(Priyanka DEVI - Cytokine Signaling) - Closed
- Genomic DNA sequencing of Burkholderia ambifaria Q53 strain isolated from peanut rizospheric soil(Mathilde BEN ASSAYA - Structural Microbiology) - Closed
Dr. Natalia Pietrosemoli is an Engineer with a M. Sc. in Modeling and Simulation of Complex Realities from the International Center for Theoretical Physics, ICTP and the International School of Advanced Studies, SISSA (Triest, Italy). During her M. Sc. internships she mostly worked in modeling, optimization, combinatorics and information theory applied to medical imaging. In 2012 she got a Ph. D in Computational Biology from the School of Bioengineering of Rice University (Houston, TX, US), where she specialized in computational structural biology and functional genomics. Her doctoral thesis “Protein functional features extracted with from primary sequences : a focus on disordered regions”, contributed to a better understanding of the functional and evolutionary role of intrinsic disorder in protein plasticity, complexity and adaptation to stress conditions. As part of her Ph. D., Natalia was a visiting scholar in two labs in Madrid: the Structural Computational Biology Group at the Spanish National Cancer Research Centre (CNIO), where she mainly worked in sequence analysis and the functional-structural relationships of proteins, and the Computational Systems Biology Group at the Spanish National Centre for Biotechnology (CNB-CSIC ), where she studied the functional implications of intrinsically disordered proteins at the genomic level for several organisms, collaborating with different experimental and theoretical groups. In 2013, she joined the Swiss Institute of Bioinformatics as a postdoctoral fellow in the Bioinformactics Core Facility. Her main project consisted in the molecular classification of a rare type of lymphoma, which involved the integration of transcriptomic, clinical and mutational data for the identification of molecular markers for classification, diagnosis and prognosis. This work was performed in collaboration with the Pathology Institute at the University Hospital of Lausanne (CHUV). In November of 2015 Natalia joined the Hub Team @ Pasteur C3BI as a Senior Bioinformatician. Natalia is especially interested in the integrative analysis of different omics data, both at large-scale and for small datasets, and loves collaborating in interdisciplinary environments and having feedback from her fellow experimental colleagues. Currently, she’s coordinating several projects performing functional and pathway analysis at the genomic level. By grouping genes, proteins and other biological molecules into the pathways they are involved in, the complexity of the analyses is significantly reduced, while the explanatory power increases with respect to having a list of differentially expressed genes or proteins.
AlgorithmicsData managementGenomicsImage analysisMachine learningModelingProteomicsSequence analysisStructural bioinformaticsTranscriptomicsDatabaseGenome analysisBiostatisticsScientific computingDatabases and ontologiesApplication of mathematics in sciencesData and text miningGeneticsGraphics and Image ProcessingBiosensors and biomarkersClinical researchCell biology and developmental biologyInteractomicsBioimage analysis
- Exploring pathogenic mechanisms of chronic inflammatory disease: unresolved issues in IL-23/IL-17 biology(YAHIA HANANE - Immunoregulation) - Pending
- Study of the role of cyclic dimeric guanosine mono-phosphate (c-di-GMP) in the regulation of virulence and biofilm formation in Leptospira interrogans(Gregoire DAVIGNON - Other) - Pending
- Global BioID-based SARS-CoV-2 proteins proximal interactome unveils novel ties between viral polypeptides and host factors involved in multiple COVID19-associated mechanisms(Yves JACOB - Molecular Genetics of RNA Viruses) - In Progress
Najwa has been a postdoctoral fellow funded by the PTR project OM-Nega of the Institut Pasteur. Since January 2018 she has become the permanent bioinformatician of the group as part of the Hub team C3BI of the Institut Pasteur.
GenomicsSequence analysisDatabaseGenome analysisEvolutionOrthology and paralogy analysis
Hugo Varet is a biostatistician engineer from the Ensai (Ecole Nationale de la Statistique et de l’Analyse de l’Information) and has been recruited in 2013 by the Transcriptome & Epigenome Platform of the Biomics Pole. Late 2014 he obtained a permanent position at the Bioinformatics & Biostatistics Hub and has been detached to the platform to continue the statistical analyses of RNA-Seq data and develop R pipelines and Shiny applications that help in this task. One of them is named SARTools and is available on GitHub: https://github.com/PF2-pasteur-fr/SARTools. In December 2019 he left the Biomics Platform and joined the Bioinformatics & Biostatistics Hub as a core-member.
MetabolomicsModelingSequence analysisStatistical inferenceTranscriptomicsBiostatisticsScientific computingApplication of mathematics in sciencesExploratory data analysisHigh Throughput ScreeningClinical research
- Evaluation in cellulo of the impact of insecticide usage on arbovirus population(VALLET THOMAS - Viral Populations and Pathogenesis) - In Progress
- Analysis of the molecular pathways induced by the activation of the Nod2 receptor by MDP in hypothalamic neurons(Ilana GABANYI - Perception and Memory) - Pending
- Characterization of a broad spectrum chemical inhibitor targeting the endocytic pathway to prevent bacterial intoxications.(Eléa PAILLARES - Bacterial Toxins) - In Progress
Related projects (41)
The phylogenetic position and status of “giant viruses”, formerly called NucleoCytoplasmic Large DNA viruses (NCLDV) or putative order Megavirales, are controversial. Many preliminary phylogenetic analyses have been published, but their presentations are usually highly biased by the prejudice of the authors concerning the nature of giant viruses. Our own preliminary analyses suggest that giant viruses are indeed ancient (they predate the last universal eukaryotic ancestor) and have possibly provided important functions to emerging eukaryotic cells (e.g. DNA topoisomerase activities). The number of giant virus genomes has recently dramatically increased, opening new opportunity to study their position in the “universal tree of life” and their evolutionary relationships with eukaryotes. The aim of the project is to perform an exhaustive phylogenetic analysis of all giant virus proteins with eukaryotic (archaeal/bacterial) homologues to (i) test the monophyly of giant viruses, (ii) determine their contribution to early eukaryotic evolution, iii) determine if some giant virus proteins can be useful to root the eukaryotic tree. We need the help of a bioinformatics colleague with good expertise in building phylogenetic trees from large data sets using different methods of tree construction and robustness evaluation. This work will be complemented by the systematic search for significant indels (insertion/deletion) in the alignments obtained by two members of the BMGE team (Patrick Forterre and Morgan Gaia).
Background: The opportunistic pathogen Candida glabrata is a member of the Saccharomycetaceae yeasts. Like its close relative Saccharomyces cerevisiae, it underwent a whole-genome duplication followed by an extensive loss of genes. Its genome contains a large number of very long tandem repeats, called Megasatellites. In order to determine the whole replication program of C. glabrata genome and its general chromosomal organization, we used deep-sequencing and Chromosome Conformation Capture (3C) experiments. Results: We identified 253 replication fork origins, genomewide. Centromeres, HML and HMR loci and most histone genes are replicated early, whereas natural chromosomal breakpoints are located in late replicating regions. In addition, 275 Autonomously Replicating Sequences (ARS) were identified during ARS-capture experiments, and their relative fitness was determined during growth competition. Analysis of ARSs allowed to identify a 17 bp consensus, similar to the S. cerevisiae ARS Consensus Sequence (ACS) but slightly more constrained. Megasatellites are not in close proximity to replication origins or termini. Using chromosome conformation capture, we also show that early origins tend to cluster whereas non-subtelomeric megasatellites do not cluster in the yeast nucleus. Conclusions: Despite a shorter cell cycle, the C. glabrata replication program shares unexpected striking similarities to S. cerevisiae, in spite of their large evolutionary distance and the presence of highly repetitive large tandem repeats in C. glabrata. No correlation could be found between the replication program and megasatellites, suggesting that their formation and propagation might not be directly caused by replication fork initiation or termination.
We have recently identified around 500 long non-coding RNAs that follow a very precise expression pattern in undifferentiated mouse Embryonic Stem cells. In order to assess if any of those molecules are functionally relevant for the biology of ES cells we will perform a functional screening based on the CRISPR technology. We plan to use a modified version of the CRISPR-CAS9 technology, the CRISPRon system, in which (i) the guide-RNAs (gRNAs) will target the promoters of our candidate lncRNAs (at least 5 gRNAs per promoter) and (ii) an enzymatically-inactive CAS9 will be engineered to recrute 10 molecules of the potent VP64 transactivator. This will enable to upregulate the targeted lncRNA from its endogeneous locus.
The recent analysis of the Cryptococcus neoformans transcriptomes revealed the presence of thousands of lncRNAs. In these yeasts, different types of lncRNAs seem to exist. The ones that are antisense of coding genes (the NATs), the ones that are located between coding genes (the lincRNAs) and some others that seem to result from alternative transcription start site selection. We identified growth conditions under which the expression of some of them is regulated. We have also identified some genes implicated in the regulation of some of these lncRNAs. This project deals with the characterisation of these lncRNAs, the analysis of their regulation and the study of their function in the biology and virulence of this pathogenic yeast.
HIV-1 replication requires the integration of the viral genome into the cell genome. A viral-encoded enzyme, integrase (IN), performs this critical step of infection and is a promising target for anti-viral therapeutics. If the catalytic properties of INs are well characterized, the mechanisms responsible for their site selectivity are still under investigation. Several cellular proteins, such as the LEDFGF/p75 transcription co-activator, the RNA polymerase II machinery, nuclear pore proteins and specific modified histones have been proposed to be involved in IN selectivity at a genomic level but the underlying molecular mechanisms remain to be demonstrated. In addition, structural parameters of the target DNA helix (curvature, flexibility, topology) are proposed to regulate IN selectivity at a local level. Our aims are to study the role of these different parameters of IN selectivity, using both in vitro and in vivo approaches. In vitro, we will map integration sites on various target DNA substrates (naked DNA or chromatin, minicircles, plasmids with different topologies, transcribed templates) and will test the effect of purified proteins suspected to regulate IN selectivity. In vivo, integration sites will be mapped in cells depleted of these suspected regulators or in cells incubated with drugs targeting enzymes involved in transcription, DNA topology or histone modifications. Integration sites will be mapped using published or “home-made” protocols and the sites will be compared with DNA structural parameters, nucleosome positions, histone modifications or transcriptional parameters (published maps). Bio-informatics tools are crucial for these correlative and statistical analyses of integration sites. Our project relies on complementary in vivo, in vitro and in silico approaches. It should establish molecular and mechanistic rules of HIV-1 integration selectivity that could serve in the development of new antiviral strategies and of safer gene therapy vectors.
Gli are transcriptional regulators involved in the Shh signalling pathway. Gli3 binding sites on DNA have been defined in the mouse genome by ChIP-chip experiments (Vokes et al., 2008). We are working on the Msx transcription factor family, and have strong evidence indicative of interactions between Msx1 and Gli3 at the protein level. This might reflect in Msx1 binding to DNA in the vicinity of Gli3 binding sites. However, Msx1 binding sites are ill-defined. One stategy to confirm this hypothesis is to compare sequences around gli3 binding sites (around 4200 such sequences in published data) and look for conserved stretches that might define Msx1 binding sites.
The Sudanese L. donovani strain Ld1SA is the most important experimental reference strain in our field and has been used for various systems level analysis (DNAseq, RNAseq, proteomics). However, all these analyses are strongly compromised by the fact that the current L. donovani genome reference strain LdBPK is from Nepal and very different from the Sudanese isolate (only 60% of DNAseq reads can be mapped).
Analyses fonctionnel des protéines humains de type ubiquitin qui interagir avec les virales protéines de la grippe
Collect statistics on variants found in different strains of influenza virus. Determine variant effects to the protein sequence.
A long-term mission for an assigned CIH-embedded bioinformatician to provide bioinformatic support to the CIH community
The Center for Human Immunology (CIH) supports researchers involved in translational research projects by providing access to 16 different cutting edge technologies. Currently, the CIH hosts over 60 scientific projects coming from 8 departments of the Institut Pastuer and 5 external teams. In order to respond to the growing needs of these projects in the area of single cell analysis, the CIH has introduced a significant number of single-cell/single-molecule technologies over the past 2-3 years. These new technologies, such as the Personal Genome Machine (PGM) and Ion Proton sequencers, iSCAN microarray scanner, Nanostring technology for transcriptomics profiling and real-time PCR machine BioMark, give rise to large datasets with high dimensionality. Such trend, in terms of data complexity, is also true for flow cytometry technologies (currently reaching over 20 parameters per cell). The exploration of this data is generally beyond the scope of scientists involved in translational research projects. In order to maximize the research outcomes obtained from the analysis of these rich datasets, and to ensure that the full potential of our technologies can be served to the users of the CIH, we would require a proximity bioinformatics support. A CIH-embedded bioinformatician would: 1) design and implement standard analysis pipelines for each of the data-rich technologies of the CIH; 2) provide regular ‘bioinformatics clinics’ to allow scientists the possibility to customize standard pipelines to their specific needs; 3) run trainings on the ‘R software’ platform and other data analysis tools (such as Qlucore) of interest for the CIH users. The objective would be to empower the users to run exploratory analysis by themselves, and to teach good practices in terms of data management and data analysis.
BioHub LeiSHield project This proposal summarizes the contribution of the BioHub to the LeiSHield action that may be carried out by a single BioHub Leishmania coordinator (Giovanni Bussotti). The coordinator will be implicated in the following actions: 1) Establish the link between LeiSHield members and the BioHub team for all questions regarding data analysis and interpretation. The coordinator will present to the BioHub the bio-informatics needs of the LeiSHield partners. Short (easy) tasks will be answered directly (following the BioHub open door strategy). For more involved tasks i twill be asked to deposit projects via the C3BI web site. 2) Coordinate the setup of an HTseq analysis pipeline, including quality control, read mapping, determination of CNV and SNPs, and data visualization using a combination of tools available at the BioHub, such as SyntView and Listeriomics. A link to Cedric Notredame will be established as scripts for Leishmania have been created there. 3) Oversee the submission of DNA from the different LeiSHield WPs to the IP HTseq facility, follow the progress, store the acquired data, and dispatch the datasets to the corresponding WP leaders. This will be coordinate with the Biomix infrastrcuture. 4) Apply the HTseq analysis pipeline (see point 1) on selected data sets for defined work packages, including WP4 (“Analysis of newly isolated anthroponotic L. donovani s.l. strains from Cyprus and correlation of genotypic profiles to tropism and drug resistance”), WP6 (“Population genetics of Brazilian L. infantum isolates from endemic areas presenting distinct transmission cycle”), WP7 (“Leishmania dovovani genome sequence diversity and disease tropism in the Sudan”), and WP9 (“Systems-wide analysis of Leishmania genomic and transcriptomic adaptation”). 5) Co-organize a course on HTseq data visualization (June 2016) with members of the BioHub team.
Chromosome amplification is commonly used by Leishmania during adaptation to environment. In this context it is challenging to look for genes relevant for parasite virulence/attenuation/drug resistance... To restrict this chromosomal amplification, a cosmid approach (CoSeq) has been chosen to select for genes that provide fitness gain to Leishmania donovani parasites in culture and in the animal. Therefore, a cosmid library has been generated with genomic DNA from the parasites which needs to be sequenced to control for genome coverage before transfection to the parasites. The transfected parasites will then be injected to animals or submitted to different culture conditions. Only those transfected with cosmids providing advantage under the studied conditions will be selected and will replicate. These cosmids will be extracted from the parasites and will be sequenced to reveal genes relevant for the parasite survival. The C3Bi would be implicated in the analyses of the sequencing data obtained from the PF1 (retrieve the data, mapping of the reads to Leishmania genome, estimation of the genome coverage, listing of genes selected for a given condition...).
The first two cases in France of botulism due to Clostridium baratii type F were identified in November 2014, in the same family. Both cases required prolonged respiratory assistance. One of the cases had extremely high toxin serum levels and remained paralysed for two weeks. Investigations strongly supported the hypothesis of a common exposure during a family meal with high level contamination of the source. However, all analyses of leftover food remained negative. Clostridium baratii was isolated from stool specimens from the two patients. A second cluster of three cases of food-borne botulism due to Clostridium baratii type F occurred in France in August 2015. All cases required respiratory assistance. Consumption of a Bolognese sauce at the same restaurant was the likely source of contamination. Clostridium baratii was isolated both from stool specimens from the three patients and ground meat used to prepare the sauce. This is the second episode reported in France caused by this rare pathogen. One strain from the first cluster and four from the second one have been sequenced (Illumina technology). A complete genome of a C. baratii strain (Sullivan) isolated from adult stool in 2007 in New York is available in Genbank (chromosome: CP006905; plasmid: CP006906).
One project of our laboratory is to characterize the function of small proteins specifically expressed in nongrowing Salmonella cells under the tight control of SigmaS. Our current works focus on small proteins secreted or targeted to the membrane since their characterization might reveal novel aspects of membrane functions and secretion in persistent bacteria, including pathogens. The proposed project is an extension of a former project on the phylogenetic distribution of the small orfs, and aims at analyzing the genomic context of orthologs.
Integration of the viral reverse-transcribed genome into the genome of infected cells is an essential step of retroviral replication and is performed by a viral-encoded enzyme, named integrase (IN). In the case of HIV-1, IN is a new and efficient anti-viral target. The selectivity of this enzyme for its cellular genomic sites is also a major parameter of HIV replication and is regulated by several cellular parameters. One of them is chromatin, and different levels of this nucleoprotein complex are involved in the regulation of IN selectivity. Using in vitro integration assays, established by our team and collaborators, we have studied this regulation at two levels of chromatin architecture: large poly-nucleosome templates (Botbol et al., 2008; Lesbats et al., 2011; Benleulmi et al., 2015; Naughtin et al., 2015) or nucleosome-induced DNA curvature mimicked by DNA minicircles (Pasi et al., 2016). Our present project is to study IN selectivity into mononucleosomes (MN). These MNs will be used as target substrates of integration and the role of MN structure, histone modifications and IN cofactors will be studied. Results obtained in vitro, will be confronted to structural data obtained by molecular modeling and to integration sites observed in infected cells. This project will benefit from our expertise in integration in chromatin templates and a previous collaboration with the C3BI on the analysis of integration sites (Pasi, M., Mornico, D., S. Volant, S., et al., 2016). This project is funded by the ANRS.
Mycolactone, the lipid toxin produced by Mycobacterium ulcerans, has recently been shown to target the Sec61 transolcon, blocking protein translocation accross the ER membrane. As Sec61 has also been proposed as a critical mediator in cross-presentation in dendritic cells, this analysis aims to analyse mycolactone effects on the proteome of dendritic cells.
We previously showed that humanized immune system (HIS) mice generated in Balb/c Rag2-/-γc-/- SIRPNOD (BRGS) recipients are susceptible to HIV-1 infection (X4 and R5 isolates) and maintain circulating HIV-1 in the plasma, resulting in a dramatic depletion of human CD4+ T cells. We also characterized features of HIV physiopathology in this model. Human thymocyte subsets developing in the thymus of HIS mice appear phenotypically normal, but in the periphery the T cell repertoire is restricted compared with that of human peripheral blood T cells. This negatively impacts on the ability of HIS mice to generate antigen-specific human immune responses when mice are vaccinated with protein antigens or following infection with lymphotropic viruses such as HIV. One likely explanation for these functional deficiencies involves the fact that human T cells are selected intrathymically by mouse MHC molecules and that naïve T cells in peripheral lymphoid organs interact primarily with mouse DC (as human DC development in HIS mice is limited). As a first line of improvement, we recently generated a novel mouse model by crossing our BRGS mice with the HLA-A*02-HHD class I transgenic mice and the HLA-DRB1*15 class II transgenic mice, resulting in BRGS-A2DR2 mice. Following intra-hepatic injection of these mice with MHC-matched CD34+ stem cells we observed increased engraftment, with faster kinetics. Moreover BRGS-A2DR2 HIS mice have an increased T cell development leading to a more equilibrated B/T and CD4/CD8 phenotype. We showed that BRGS-A2DR2 HIS mice were able to sustain replication of HIV R5 virus as the BRGS hosts. Viremia was similar in a first phase and then lower in a second phase in BRGS-A2DR2 compared to BRGS HIS mice, which could be a consequence of a better quality of the immune response. However, the viremia reached a similar plateau in the last phase. We propose to study the impact of the immune res
Linked with the project #7283 managed by Thomas Bigot, this proposal aims to detect antibiotic resistance genes in whole-genome sequence data of the bacteria deposited and stored at CIP. The CARD library (http://arpcard.mcmaster.ca) would be used to screen for common acquired resistance genes. Detection of point mutations in house-keeping genes and/or transporters would require specific queries from a fasta file continuously implemented by myself.
The adenylate cyclase (CyaA) produced by B. pertussis, the causative agent of whooping cough, is one of the major virulence factors of this organism. CyaA plays an important role in the early stages of respiratory tract colonization by B. pertussis. This toxin uses an original intoxication mechanism: secreted by the virulent bacteria, it is able to invade eukaryotic target cells through a unique but poorly understood mechanism that involves a direct translocation of the catalytic domain across the plasma membrane. CyaA is a 1706-residue long protein organized in a modular fashion. The ATP-cyclizing, calmodulin-activated, catalytic domain (ACD) is located in the 400 amino-terminal residues. Once secreted by the bacteria, the toxin binds calcium in the extracellular milieu and refolds into a functional state. Then, CyaA translocates its catalytic domain directly across the plasma membrane from the extracellular medium to the host cell cytoplasm where, upon activation by endogenous calmodulin, it increases the concentration of cAMP to supraphysiological levels that ultimately leads to the cell death. Recently, we succeeded to refold CyaA in a stable and monomeric form that is fully folded and functional (at variance with all prior procedures in which the polypeptides were largely aggregated upon urea removal). Both calcium and molecular confinement are mandatory to produce the monomeric state and CyaA acylation also strongly contributes to the refolding process. We further show that the monomeric preparation displayed hemolytic and cytotoxic activities suggesting that the monomer is the genuine, physiologically active form of the toxin. Hence, despite recent advances in the understanding of CyaA, its mechanisms of cell intoxication process, in particular the membrane translocation step, remains poorly understood from a fundamental perspective. The description of the molecular events occurring prior to and during the translocation of the catalytic domain across the lipi
Our lab is interested in the assembly of the inner viral membrane of Vaccinia virus and what are the molecular regulators (proteins and lipids) playing a role in this complex and unique process of virion morphogenesis.
Vérification et cartographie de génomes de corynebacteries modifiés. Recherche de suppresseurs à partir de délétions de gènes de kinase (pknA; pknB ...)
Genomic determinants for initiation and length of natural antisense transcripts in a compact eukaryotic genome and phylogenetic analysis of related Entamoeba species
Entamoeba histolytica is a protozoan parasite and an amitochondriate pathogenic amoeba, which causes amoebiasis (dysentery and liver abscess) in humans. In addition to E. histolytica several species infect the human intestine although these do not cause disease and include in most of cases E. dispar and ocassionnally E. moshkovskii. A phylogenetically close Entamoeba, E. invadens infecting snails, is used as cellular model for Entamoeba cyst formation.
Supported by the National Agency for Research (ANR-10-GENM-0011) we developed a project to firstly study the transcriptional landscape of pathogenic E. histolytica. Among the results we discovered that 60% of ORFs present anti-sense RNAs (NATs) that map to the 3‘ end of genes. Their regulation is modified upon environmental changes. The regulation of NATs is basically governed by genomic sequences within the very short intragenic region of the amoeba genome. Secondly, we have started to conduct comparative genomics and transcriptomics approaches to understand phenotypic differences between Entamoeba species, in particular with respect to virulence.
Phylogenetic analysis of protein families to find mutations that increase their thermal stability and expression levels in a cost- and labor-effective way.
Harmonin Homology Domains (HHD) are protein-protein interaction domains primarily identified in a few proteins involved in earing and vision. Only six HHD-containing proteins encompassing 9 HHD domains have been identified so far in mammals. Four of these proteins are neuronal and also contain PDZ domains, exhibiting different HHD-PDZ modular organisations. This project aims at 1) searching orthologs with HHD domains, 2) establishing the phylogenetic tree of HHD-containing proteins and 3) understanding how various patterns of HHD-PDZ modules from the previsouly mentioned neuronal proteins originated.
phage phiVC8 is a lytic phage for Vibrio cholerae 01 biotype classic. This phage has a narrow host range and we are interested to undersatand why. We have selected several mutants of the Vibrio strain resistant to the phage. Resistance could be due to several mechanisms that we want to uunderstand
Membrane fusion is an essential process in all forms of life, and fusion proteins are responsible of catalysing this reaction by forcing the two membranes against each other by undergoing a fusogenic conformational change. Viral fusion proteins are the most well studied and are classified in three different groups: class-I, with a central trimeric alpha-helical coiled coil with the fusion peptide at the N-terminal end; class-II, which are folded as three beta-sheet rich domains with an internal fusion loop; and class-III, with a trimeric alpha-helical coiled coil with an internal fusion loop at the tip of a β-sheet rich domain. Despite their totally different structures, the proteins from the three classes use the same overall fusion mechanism. The syncytins, which are class-I fusion proteins captured from a retrovirus that integrated in the germ-line of a primitive mammal more than 25 millions years ago, are responsible for the development of the placenta. Recently, orthologs to the class-II viral fusion proteins have been found in eukaryotic organisms: the epithelial fusion factor (EFF) family in nematodes, responsible for skin formation, and HAP2-GCS1 proteins, responsible for gamete fusion for fertilization in green algae, higher plants, unicellular protozoa, cnidarians, hemichordates, and arthropods. The fact that HAP2 was identified in the main branches of the eukaryotic taxa with the exception fungi, suggest that they should also be present in these organisms. In particular, HAP2 was identified in some insects but not in all of them, and has not been detected in vertebrates. This project is aimed at the detection of amino acid sequences sequences compatible with class II fusion proteins among the membrane proteins present in vertebrates and yeast
Alpacas belong to the family of camelids, which has the peculiarity to produce two kinds of antibodies: conventional antibodies made of heavy and light chains and particular single chain antibodies made of heavy chain only. In these antibodies the antigen recognition domain, called nanobodies or VHH, is monomeric and has the advantage to have high affinity for the antigen and to be smaller and more stable compared to that of conventional antibodies. The gold standard technique to isolate nanobodies against specific target is the phage display. This method is very laborious and time consuming, thus we sought to identify and set-up faster and simpler system for the high-throughput screening of VHH. We would like also to compare methods in term of size and sequence variability and representation. To this aim we want to sequence by NGS the VHH repertoire selected by different approaches.
There exists a broad biodiversity inside the Listeria monocytogenes species, which can be summarized by the existence of evolutionary lineages and more than 100 clonal complexes (CCs or clones) based on core genome multilocus sequence typing (cgMLST), which are geographically and temporally widespread. We aim to link genomic markers to temporal, geographical and sampling origin in order to better understand the ecology and evolution of Listeria monocytogenes.
We passaged EV71 in cell culture, followed by RNAseq analysis. The sequences were analysed for the presence of internal deletions within the viral genome. Such deletions can contribute to non-homologous and homologous recombination within Enteroviruses, which can have positive effects, such as the purging of deleterious mutations or to overcome host restrictions. On the other hand such deletions could result in the formation of defective interfering particles (DIs), which have the ability to interfere with wildtype virus infections and thus, reduce viral loads in ongoing infections. We aim to get a better understanding of these events during viral replication that can aid us to identify DIs and can help us explore these special forms of virus genome deletions as potential therapeutic agents.
We want to determine the function of a protein conserved among certain phages. Homology searches yield about a hundred hits that clearly share some motifs, however they remain 'putative', without an assigned function. Searching for known motifs doesn't give any hits neither. We would like to use a database of C3BI dedicated to viral proteins to find a possible clue about the protein's role.
Ensuring good storage conditions is fundamental for a long-term conservation of bacterial strains. Freeze-drying, freezing, and liquid nitrogen are methods the most frequently used for long term storage. The effectiveness of these different methods to ensure the stability of the genetic characteristics of the bacterial strains over the long term preservation needs to be evaluated. The submitted project aims to see what impact has the long term preservation at the genomic level and if the different batches (frozen, in liquid nitrogen or freeze-dried) of the strain Campylobacter fetus CIP 53.96) are the same.
We wish to study the genome-wide integration preferences of integron cassettes in the MG1655 Escherichia coli genome.
The aim of this study is to shed light into novel mechanisms which increase antimicrobial resistance (AMR) in Gram negative bacteria. In order to gain insight into new mechanisms involved in development of AMR, we study the strategies undertaken by Gram negative bacteria in response to low antibiotic stress, and particularly the role codon bias. Our preliminary results point to a role of codon usage in the survival to low antibiotics targeting various functions. Differences in codon decoding may mpact adaptation and survival to changing environments. We will explore the proteome of V. cholerae to search for protein groups with particular codon biases, expected to impact their translation in the presence of antibiotics.
Centrosomes are the main microtubule organizing center of eukaryotic cells with critical roles in cell division, polarity, signaling and structure. In most cells, one or both centrioles act as basal body (BB), nucleating microtubules to form cilia or flagella, sensory and motile organelles of vital importance for a wide range of biological functions. Notorious deadly diseases such as cancer, microcephaly and ciliopathies correlate with dysregulation in the number and/or structure of the centrosome/BB. Defects in centriolar proteins also impact cell division and flagellar function of parasitic protists. Notably, T. gondii can assemble flagella during its sexual cycle within the cat’s enteroepithelial tissue, a largely unattainable life stage in vitro. The state of the art of the field points at the centrosome and basal body of apicomplexan and trypanosomatid parasites as potentially rich sources of novel therapeutic targets to fight parasitic diseases. However, their molecular composition and the regulation of their biogenesis remain ill-described. Albeit a number of structural components appear to be conserved between parasitic protozoa and their vertebrate hosts, the absence of conserved homologs of regulatory components, suggests that their biogenesis is likely controlled by divergent triggers of unknown targets. Within the framework of a funded ACIP grant (076-2017), this team pursued the characterization of the centrosome composition in T. gondii, and explored the localization of newly identified principles in T. brucei. This proposal focuses on deciphering the role of the newly identified proteins in the biology of the centrosome in Toxoplasma gondii, as a model for the phylum apicomplexa, and to analyze the role of these conserved proteins in basal body biogenesis and function in Trypanosoma brucei. Based on our preliminary identification of novel centrosomal/basal body components and the powerful tools available in our model organisms, we now propose: 1. To analyze the phylogenetic distribution and functional domains of 20 novel proteins of T. gondii through bioinformatic approaches. 2. To assess the localization of these 20 proteins, and the function and cell cycle dynamics of those localized to the centrosome, in T. gondii. 3. To characterize the function of a protein complex linking the centrosome to nascent daughter cells in T. gondii. 4. To characterize the role of 3 novel T. brucei proteins homologs in basal body biology.
We are studying a specific genomic signature and would like to evaluate how frequent is this signature in phage genomes.
The fliC locus codes for the FliC protein, the flagellin molecule that constitutes the flagellar filament of motile bacteria, like Y. pseudotuberculosis. Until now, it was commonly accepted that transcription of this locus was completely repressed in the non-motile Yersinia pestis (a recent clone of Y. pseudotuberculosis, which still harbors flagellar genes), but by transcriptomics analysis we observed that it was actually active. Even more, we showed that deletion of the fliC locus in Y. pestis was associated to an attenuation of virulence in the mouse model of bubonic plague. However, for the time being we have no proof that FliC protein is indeed synthesized. Recently, a more detailed analysis of the transcriptomics data showed that a great part of the transcripts detected was actually coming from transcription of the fliC locus in the ‘anti-sense’. Therefore, we would like to explore the hypothesis that the actual molecule playing a role in the phenotype observed is an anti-sense ARN with regulatory functions, instead of the genic product of the fliC ORF, i.e. the protein FliC.