Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Workflow and pipeline development
Related people (10)
After a Master degree in Genetics at Magistère Européen de Génétique, Paris Diderot, I did a second Master in bioinformatics at University of Nantes where I focused my work on the study of mapping strategy for allele specific analysis at the bioinformatics platform of Institut Curie. I then joined Institut Pasteur to work on an ELIXIR project related to the bio.tools registry through the development of a dedicated tool and the participation of several workshops and hackathons. As an engineer of the bioinformatics and Biostatistics Hub, I am involved in several projects from Differential Analysis of RNA-seq data to Metagenomics. I am also in charge of the maintenance of the Galaxy Pasteur instance.
ChIP-seqEpigenomicsGenomicsSequence analysisProgram developmentDatabases and ontologiesSofware development and engineeringGeneticsData integrationRead mappingWorkflow and pipeline developmentConfocal Microscopy
- Impact of gut microbiota on lipid metabolism(Grégoire CHEVALIER - Microenvironment and Immunity) - Closed
- Analysis of IFITM RNA levels in vraious cell types and tissues(Olivier SCHWARTZ - Virus and Immunity) - Closed
- Channels in metagenomics data(Delarue MARC - Structural Dynamics of Macromolecules) - Closed + 1 project
After a PhD in Biology in 2011 on population genetics and phylogeography on amazing little amphipods (Crangonyx, Crymostygius) at the University of Reykjavik (Iceland), I pursued my interest in Bioinformatics and Evolutionary Biology in various post-docs in Spain (MNCN Madrid, UB Barcelona). During this time, I investigated transcriptomic landscapes for various non-model species (groups Conus, Junco and Caecilians) using de novo assemblies and participated in the development of TRUFA, a web platform for de novo RNA-seq analysis. In July 2016, I integrated the Revive Consortium and the Epigenetic Regulation unit at Pasteur Institute, where my main focus were transcriptomic and epigenetic analyses on various thematics using short and long reads technologies, with a special interest in alternative splicing events detection. I joined the Bioinformatics and Biostatistics Hub in January 2018. My latest interests are long reads technologies, alternative splicing and achieving reproducibility in Bioinformatics using workflow managers, container technologies and literate programming.
Data managementData VisualizationSequence analysisTranscriptomicsWeb developmentGenome analysisProgram developmentExploratory data analysisSofware development and engineeringGeneticsEvolutionRead mappingWorkflow and pipeline developmentPopulation geneticsMotifs and patterns detectionGrid and cloud computing
HumanInsect or arthropodOther animalAnopheles gambiae (African malaria mosquito)Mouse
- Build a software to decipher Gephyrin alternative transcripts obtained with long read sequencing(allemand ERIC - Epigenetic Regulation) - Closed
- Transcriptomics of Anopheles – Plasmodium vivax interactions towards identification of malaria transmission blocking targets(Catherine BOURGOUIN - Functional Genetics of Infectious Diseases) - Closed
- Mapping of Enhancers from transcriptome data(Christian MUCHARDT - Epigenetic Regulation) - Closed
Rachel Legendre is a bioinformatics engineer. She completed her master degree in apprenticeship for two years at INRA in Jouy-en-Josas in the Genetic Animal department. She was involved in a project aiming at the detection and the expression analysis of micro-RNA involved in an equine disease. In 2012, she joined the Genomic, Structure and Translation Team at Paris-Sud (Paris XI) university. She worked principally on Ribosome Profiling data analysis, a new technique that allows to identify the position of the ribosome on the mRNA at the nucleotide level. Since november 2015, she worked at Institut Pasteur. During 4 years, she was detached to the Biomics Platform, where she was in charge of the bioinformatics analyses for transcriptomics and epigenomics projects. She was also involved in Long Reads (PacBio and Nanopore) developments with other bioinformaticians of Biomics. Since november 2019, she has joined the Hub of Bioinformatics and Biostatistics, et more precisely the Genome Organization Regulation and Expression group.
AlgorithmicsChIP-seqEpigenomicsNon coding RNATranscriptomicsGenome analysisProgram developmentScientific computingSofware development and engineeringIllumina HiSeqRead mappingSequencingWorkflow and pipeline developmentChromatin accessibility assaysPac BioRibosome profiling
BacteriaFungiParasiteHumanInsect or arthropodOther animal
- CHIP-seq identification of IRF8 binding site(Ludivine GRZELAK - Virus and Immunity) - Awaiting Publication
- Exploring pathogenic mechanisms of chronic inflammatory disease: unresolved issues in IL-23/IL-17 biology(YAHIA HANANE - Immunoregulation) - In Progress
- Identification of factors influencing the activity of bacteriophage within the gut of mammals(Devon CONTI - Other) - In Progress
After a Master degree in bioinformatics and biostatistics, I did a PhD in computer science / bioinformatics at University Paris-Sud (now in University Paris-Saclay), where I worked on integration and analysis of comparative genomics data. After a postdoc in Lausanne, Switzerland where I worked on small-RNA sequencing data, I joined GenoSplice where I was responsible for the development of bioinformatics projects related to next generation sequencing. I joined Institut Pasteur in Nov. 2015, to work in the Evolutionary Bioinformatics Unit and participate in the development of new tools and algorithms that are able to tackle efficiently the ever increasing amount of sequencing data.
AlgorithmicsData managementPhylogeneticsSequence analysisDatabaseGenome analysisProgram developmentScientific computingDatabases and ontologiesSequencingWorkflow and pipeline development
I obtained a PhD in phylogeny in 2008 at the Muséum National d’Histoire Naturelle in Paris, then worked as a post-doc in Torino (Italy, 2009 – 2011) and Faro (Portugal, 2011 – 2013) where I worked on methodological aspects of phylogeny. In 2013, I have been hired as research engineer in bioinformatics at the Institut de Génétique Humaine in Montpellier where I wrote tools to analyse high-throughput sequencing data, especially small RNA-seq. This is also the kind of job I do now at Institut Pasteur, since 2016. I enjoy programming in Python, I’m interested in evolutionary biology, and I find teaching the UNIX command-line and other practical computer skills a rewarding activity. I’m also particularly involved in a course introducing PhD students (and sometimes other staff at Institut Pasteur) to R programming and basic descriptive statistics. The course support is available on-line and can hopefully be studied autonomously: https://hub-courses.pages.pasteur.fr/R_pasteur_phd/First_steps_RStudio.html One of my main activities is the development of automated data analysis workflows using Snakemake. My published work is available here: http://www.normalesup.org/~bli/useful.html
GenomicsNon coding RNATranscriptomicsSofware development and engineeringGeneticsWorkflow and pipeline development
Insect or arthropodOther animalDrosophila melanogaster (Fruit fly)C. elegans
- Codon Usage Bias Analysis in Vibrio(Marie-Eve KENNEDY-VAL - Bacterial Genome Plasticity) - In Progress
- Gene conversion and allelic selection drives L. donovani genomic adaptation in experimental Sand fly infection(Gerald SPAETH - Molecular Parasitology and Signaling) - In Progress
- The LeiSHield-MATI consortium: Investigating genomic adaptation of Leishmania parasites in endemic areas(Gerald SPAETH - Molecular Parasitology and Signaling) - In Progress
After a PhD in biochemistry of the rapeseed proteins, during which I developed my first automated scripts for handling data processing and analysis, I join Danone research facility center for developing multivariate models for the prediction of milk protein composition using infrared spectrometry.
As I was already developing my own informatics tools, I decided to join the course of informatic for biology of the Institut Pasteur in 2007. At the end of the course I was recruited by the Institute and integrate the unit of “génétique des interactions macromoléculaires” of Alain Jacquier. Within this group, I learn to handle sequencing data and I developed processing and analysis tools using python and R. I also create a genome browser and database system for storing, retrieving and visualizing microarray data. After 8 years within the Alain Jacquier’s lab, I join the Hub of bioinformatics and biostatistics as co-head of the team.
ClusteringData managementSequence analysisTranscriptomicsWeb developmentDatabaseGenome analysisProgram developmentScientific computingExploratory data analysisData and text miningIllumina HiSeqRead mappingLIMSIllumina MiSeqHigh Throughput ScreeningMultidimensional data analysisWorkflow and pipeline developmentRibosome profilingMotifs and patterns detection
- SHERLOCK4HAT - WP1.1(Brice ROTUREAU - Group: Trypanosome transmission) - Closed
- Remettre les servers Genolist comme LegioList, TuberclListe, Colibri etc en service(Carmen BUCHRIESER - Biology Of Intracellular Bacteria) - Closed
- Identification of eukaryotic 5'UTRs(Arnaud ECHARD - Membrane Traffic and Cell Division) - Closed
After a Master degree in Genome Analysis and Molecular Modeling at Denis Diderot University, I did a PhD in NMR / bioinformatics at Denis Diderot University, where I worked on the development and use of a software named DaDiModO which uses SAXS data and RDC/NMR data to calculate models of structural proteins. After a postdoc aiming to adapt ARIA software to allow execution on computing grid in the Structural Bioinformatic Team at Institut Pasteur in collaboration with IBCP, I joined CIB/DSI Team where I was responsible for the development of bioinformatics projects and the deployment, maintenance and evolution of the Pasteur Galaxy server. I joined the Hub/C3BI team in 2017 as research engineer where I’m involved in several projects such as structural bioinformatics, softwares and web development. I am also in charge of the maintenance of the Galaxy Pasteur instance.
Data managementGalaxyStructural bioinformaticsWeb developmentDatabaseProgram developmentScientific computingDatabases and ontologiesWorkflow and pipeline developmentGrid and cloud computing
- SatelliteFinder(Jorge SOUSA - Department of Genomes and Genetics,Microbial Evolutionary Genomics) - In Progress
- Development of a secure API for ARIAweb(Benjamin BARDIAUX - Structural Bioinformatics) - In Progress
- Development of a web server to calculate functional binding sites using Deep Learning(Olivier SPERANDIO - Structural Bioinformatics) - In Progress
Data managementData VisualizationWeb developmentDatabaseProgram developmentDatabases and ontologiesSofware development and engineeringData integrationWorkflow and pipeline development
- An online database of RNA-small molecules complexes for rational drug design(Massimiliano BONOMI - Structural Bioinformatics) - Closed
- Development of a contributor management webpage for iPPI-DB.(Olivier SPERANDIO - Structural Bioinformatics) - In Progress
- JASS 2 : Integrating functional annotation to a multi-trait GWAS web application(HANNA JULIENNE - Statistical Genetics) - In Progress
In 2012 I completed my master degree at the MicroScope Platform located at Genoscope (the French National Sequencing Center). I was involved in a project aiming at the management of evolution projects which rely on the Next Generation Sequencing (NGS) technologies to try to decipher the dynamics of genomic changes as well as the molecular bases and the mechanisms underlying adaptative evolution of micro-organisms (Remigi et al. 2014). Since November 2014, I joined the Bioinformatics and Biostatistics HUB at Institut Pasteur. I participated to the creation and updates of the C3BI website. I joined the WINTER group where I’m in charge of web and interface development projects. I have completed an UX-Design training to add extra value to my front-end development skills. I design and develop bioinformatics tools and interfaces that are users oriented.
Data VisualizationWeb developmentDatabaseGenome analysisScientific computingDatabases and ontologiesSofware development and engineeringWorkflow and pipeline development
- User experience design for Oncodash(Dreo JOHANN - Systems Biology) - In Progress
- Monitoring tool for scientist who have received MAASCC career guidance(Marion GUESSOUM - Other) - Pending
- An online database of RNA-small molecules complexes for rational drug design(Massimiliano BONOMI - Structural Bioinformatics) - Closed
A computer scientist by training, I am applying this knowledge to solve biological problems and am particularly interested in modelling of biological systems, knowledge inference, ontologies and data visualisation.
AlgorithmicsData VisualizationMetabolomicsModelingPathway AnalysisPhylogeneticsSystems BiologyTool DevelopmentDatabaseProgram developmentScientific computingDatabases and ontologiesApplication of mathematics in sciencesSofware development and engineeringData and text miningEvolutionData integrationGraph theory and analysisWorkflow and pipeline developmentDiscrete and numerical optimization
VirusHuman Immunodeficiency virus (HIV)
- Modeling mitochondrial metabolism dormant Cryptococcus neoformans(Benjamin HOMMEL - Molecular Mycology) - Closed
- Measles virus protein C interplay with cellular apoptotic pathways; applications for cancer treatment(Alice MEIGNIÉ - Viral Genomics and Vaccination) - Closed
- Diffusion des mutations de résistance du VIH : modèles et méthodes d’estimation(Olivier GASCUEL - Evolutionary Bioinformatics) - Closed
Related projects (9)
Characterization of the role of Argonaute proteins in regulating germline gene expression at the transcriptional and the post-transcriptional levels.
This research project focuses on the characterization of the role of small RNAs and their associated Argonaute proteins in transcriptional and post-transcriptional regulation of germline gene expression. Using the nematode C. elegans, we have recently showed that one of the germline-expressed Argonaute protein, CSR-1, promotes germline transcription. However, CSR-1 also possess an endonucleolytic activity that might participate in post-transcriptional silencing. Therefore, two possible functions of the protein might regulate the germline transcriptome. 1) CSR-1 promotes specific germline transcription programs in the nucleus, and 2) negatively regulates expression of target transcripts in the cytoplasm. To gain mechanistic insights into these two functions, we aim to use RNA-seq, sRNA-seq, ChIP-seq, GRO-seq, Ribo-seq, RIP-seq, iCLIP in wild type worms, knock out and catalytic inactive mutants of CSR-1 protein at different times of germline development.
Understanding the pathways of small RNA production during Meiotic Silencing by Unpaired DNA (MSUD) in the fungus Neurospora crassa
The canonical (“textbook”) process of DNA homology search and recognition is initiated by DNA double-strand breaks and is mediated by the universally conserved recombinases of the RecA family. Using the phenomenon “Repeat Induced Point mutation” (RIP) in N. crassa as a model system, we have previously revealed the existence of another way to search for DNA homology, which does not require RecA proteins and which apparently operates on intact DNA double helices. This pathway can be extremely efficient, as it allows some fungi to detect the presence of only two gene-sized DNA repeats in the genome. Our current work on Meiotic Silencing by Unpaired DNA (MSUD) has shown that the same recombination-independent pathway may also be involved in the early steps of homologous chromosome pairing in meiosis, thus emerging as a conserved, perhaps fundamental mechanism of DNA homology search and recognition. We are now interested in further investigating this mechanism, using RIP and MSUD as two complementary recombination-independent processes. Specifically for this project, we are interested in identifying molecular genetic features that associate with (or even trigger) the production of small RNAs during MSUD in N. crassa.
Streptococcus agalactiae (GBS) is a commensal gram-positive bacteria which asymptomatically colonize the genital and intestinal tract of healthy women. However, GBS is the leading cause of bacterial invasive infections in newborns in developed countries. The ability of GBS to succeed both as a commensal and a pathogen relies on a highly dynamic regulation of colonization and virulence related genes. The major regulator identified to date is the two-component system CovSR (Control of Virulence Sensor and Regulator). The transcription of almost 15% of the genome is dependent on CovR, but the genes directly regulated by CovR and the regulation of CovR-DNA binding by CovR-phosphorylation are ill-defined. To characterize the genome-wide CovR binding sites, we performed chromatin immunoprecipitation and sequencing (ChIP-Seq). Technically, we developed an epitope-tagged and functional form of CovR expressed under an inducible promoter. Quantitative PCR on ChIP samples (ChIP-qPCR) and small-scale footprint experiments revealed an enrichment of binding regions on specific promoters whose transcription are CovR-dependent. Sequencing (ChIP-seq) has been done to 1) characterize the landscape of CovR binding sites along the chromosome and to reveal the function of genes directly regulated by CovR; 2) to decipher the mechanism of regulation by performing the same experiment in strains with different level of CovR phosphorylation; and 3) to unravel an evolutionary strategy of genetic rewiring leading to the emergence of hypervirulent GBS strain by comparing the CovR direct regulon and the evolution of promoter sequences in different clinical stains.
Séquençage à haut débit (NGS) et traitement de séquences ADN des domaines variables d’anticorps simple chaine d’alpaga (domaines VHH ou Nanobodies®)
Dans notre laboratoire nous nous intéressons à la conception et construction de banques d’anticorps simple chaine issus du répertoire immunitaire des alpagas. Ces banques immunes, sont constituées d’un million de séquences ADN différentes codant pour les domaines variables des anticorps simple chaine spécifiques d’une protéine cible donnée, plus communément appelés domaines VHH ou Nanobodies®. Nous souhaitons à présent caractériser ces banques crées au laboratoire par séquençage ADN à haut débit (NGS Séquence MiSeq). Cette méthode conduit à la création de fichiers composés de millions de séquences ADN nécessitant d’être traitées et analysées par la suite.
This project to characterize defective viral genomes of shrimp pathogens i.e. Yellow head virus (YHV) from next-generation sequencing of viral RNAs data by using DI-tector or similar tools. NGS data are already available for analysis. We are looking for someone interested in virology in general and more precisely the complexity of viral genomes.
The ViroScreen workflow is a bioinformatic pipeline dedicated the the analysis of NGS metagenomic data, especially in the field of virology (virome analysis). The aim of the project is to implement this bioinformatic workflow, developed with Corinne Maufrais (Bioinformatics and Biostatistics Hub), in Galaxy, to make it easly accessible to the scientific community, in easy-to-use way.
A cost-effective molecular Tool for Strengthening Antimalarial drug Resistance surveillance in Africa (TSARA)
In the TSARA project, we aim at developing and validating a molecular platform (Dual Index Targeted Amplicon Deep Sequencing on Illumina iSeq) and a dedicated bioinformatics pipeline for targeted high throughput sequencing of genetic loci related to antimalarial drug resistance. The iSeq Illumina platform, proposed here, is designed for dual indexing of samples, consisting of a 3’ and 5’ individual barcode, which allow the user to connect every sequence generated to a specific sample (up to 384 samples). The development and the validation of this novel molecular tool will be performed at IP Paris first by using DNA extracted from P. falciparum reference strains (3D7, Dd2 and culture-adapted Cambodian strains) (WP1) and second, by using P. falciparum DBS samples collected from two malaria endemic areas (Central African Republic and Cameroon) (WP2). In the WP3, we will host and train staff from IP Bangui and CP Cameroon to this new approach. Our final objective is that Dual Index Targeted Amplicon Deep Sequencing on iSeq along with minimal bioinformatics infrastructure operated in countries that are endemic for malaria to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy.
A cost-effective molecular Tool for Strengthening Antimalarial drug Resistance surveillance in Africa (TSARA)
In the TSARA project, we aim at developing and validating a molecular platform (Dual Index Targeted Amplicon Deep Sequencing on Illumina iSeq) and a dedicated bioinformatics pipeline for targeted high throughput sequencing of genetic loci related to antimalarial drug resistance. The iSeq Illumina platform, proposed here, is designed for dual indexing of samples, consisting of a 3’ and 5’ individual barcode, which allows the user to connect every sequence generated to a specific sample (up to 384 samples). The development and the validation of this novel molecular tool will be performed at IP Paris first by using DNA extracted from P. falciparum reference strains (3D7, Dd2 and culture-adapted Cambodian strains) (WP1) and second, by using P. falciparum DBS samples collected from two malaria-endemic areas (The central African Republic and Cameroon) (WP2). In the WP3, we will host and train staff from IP Bangui and CP Cameroon to this new approach. Our final objective is that Dual Index Targeted Amplicon Deep Sequencing on iSeq along with minimal bioinformatics infrastructure operated in countries that are endemic for malaria to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure the continued success of the malaria treatment policy
The Milieu Intérieur program aims to understand the genetic and environmental determinants of normal healthy immune responses. One component of this program involves analysis of the nasopharyngeal mucosal immune variation. At upper respiratory tract level, nasal-associated lymphoid tissue is strategically located to potential respiratory pathogens; in this context, B cell activation induces specific secretory IgA while T cells/ILCs produce cytokines. One factor that has been poorly studied and needs to be integrated in respiratory mucosal immune variation is the microbiome. Under normal conditions, the relationship between the nasopharyngeal microbiomes (bacterial, fungal, viral) and the host is symbiotic with many physiologic benefits. On the other hand, there is increasing evidence that suggest that the nasopharyngeal microbiome composition plays an important role in the pathogenesis of bacterial and viral acute respiratory tract infections including COVID-19. For example, even in healthy individuals potentially pathogenic bacteria and virus can also be found embedded in the community of nasopharynx commensals. Still, we have little information on the characterization of “healthy” nasopharyngeal microbiomes. Previously we have established methods to characterize bacterial nasopharynx microbiota by 16S rRNA sequencing and assays are in place to analyze secretory antibodies, interferons (IFNs) and cytokines concentrations in nasopharyngeal secretions of healthy individuals and disease patients. Our capacity to quantitate antibodies, IFNs and cytokines has allowed us to determine biomarkers in nasopharyngeal secretions that can distinguish clinical outcomes in different diseases. New evidence suggests that non-bacterial microorganisms such as viruses and fungi could be critical in modulating immunity in the host. We have performed shotgun metagenomic sequencing of nasal swab RNA from 12 healthy and disease individuals. We would like to request bioinformatic assistance to analyze the sequencing datasets in this pilot study in order to assess potential RNA viromes (assembly, identification, clustering, and taxonomy). Successful identification of known viruses will provide proof of concept for this approach which can then be applied to well-characterized normal control and disease cohorts.