Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise
Searched keyword : Data management
Related people (17)
Developing and evaluating bioinformatic tools for: – next generation sequencing data – genome analysis & comparison Specialties:Genome & Transcriptome Bioinformatics
Data managementData VisualizationGenomicsNon coding RNASequence analysisTranscriptomicsGenome analysisBiostatisticsProgram developmentScientific computingData and text miningBiosensors and biomarkersEpidemiology and public health
- Tissue-resident stromal cell heterogeneity(Lucie PEDUTO - Stroma, Inflammation and Tissue Repair) - In Progress
- Role of small non coding RNAs in the adaptive response to oxidative stress in pathogenic Leptospira(NADIA BENAROUDJ - Biology of Spirochetes) - In Progress
- Dissecting Peptidoglycan pathways in human near-haploid cells(Martine FANTON D\'ANDON - Biology and Genetics of Bacterial Cell Wall) - Pending
One of my projects consists in developing GRAVITY, a java tool based on Cytoscape to integrate genetic variants within protein-protein interaction networks to allow the visual and statistical interpretation of next-generation sequencing data, ultimately helping geneticists and clinicians to identify causal variants and better diagnose their patients. I’m also involved in several other projects in the lab, taking part in the design of pipelines for the processing and the analysis of genomics data, including SNP arrays, whole-exome and whole-genome sequencing data. This means being confronted to the big data problematic, the unit having to manage hundreds of terabytes of genomics data. Finally, I am now analysing these data in order to identify possible causes for autism, to help clinicians with their diagnosis but also to better understand the biological mechanisms at play in this complex disease. This is done through the project aiming at understanding the genetic architecture of autism in the Faroe Islands, and also with the newly starting IMI2 European project AIMS2-Trials.
AlgorithmicsData managementData VisualizationGenomicsMachine learningProteomicsGenome analysisBiostatisticsProgram developmentScientific computingApplication of mathematics in sciencesExploratory data analysisSofware development and engineeringData and text miningGenetics
I joined the Bioinformatics and Biostatistics Hub at Institut Pasteur in 2016 where I am currently developing pipelines related to NGS for the Biomics Pôle. I have an interdisciplinary research experience: after a PhD in Astronomy (gravitational wave data analysis), I joined several research institute to work in the fields of plant modelling (INRIA, Montpellier, 2008-2011), System Biology — in particular logical modelling (EMBL-EBI Cambridge, U.K., 2011-2015), and drug discovery (Sanger Institute, Cambridge, U.K.), 2015). On a daily basis, I use data analysis and machine learning techniques within high-quality software to tackle scientific problems.
AlgorithmicsData managementData VisualizationGenome assemblyGenomicsMachine learningModelingScientific computingDatabases and ontologiesSofware development and engineeringData and text miningIllumina HiSeqGraph theory and analysisIllumina MiSeq
Data managementSequence analysisStructural bioinformaticsDatabaseProgram developmentScientific computingLIMS
- Common and phylogenetically widespread coding for peptides by bacterial small RNAs – Follow up of a project regarding its journal review(Benno SCHWIKOWSKI - Systems Biology) - Closed
- A novel MacSyFinder module for detection of bacterial capsule systems on the future Galaxy platform.(Eduardo ROCHA - Microbial Evolutionary Genomics) - Closed
- Development of a web application and new functionalities for the maintenance and curation of iPPI-DB(Olivier SPERANDIO - Center for Innovation and Technological Research) - In Progress
Bernd Jagla received his PhD in bioinformatics (department of Biology, Chemistry, and Parmacy) from the Free University in Berlin, Germany in 1999. Before joining the Institut Pasteur, he worked for almost ten years in New York City, including as an associate research scientist in the Joint Centers for System Biology (Columbia University) and at the Columbia University Screening Center led by Dr J.E. Rothman. He joined the Institut Pasteur in 2009 to take charge of the bioinformatic needs at the Transcriptome et Epigenome platform, focusing on Next Generation Sequencing. As of 2016 he is member of the C3BI – HUB Team detached to the Human immunology center (CIH) and provides support for cytometry, next generation sequencing, and microarray data analysis. His areas of interest include the quality assurance and data analysis and visualization at the facility. He also has strong expertise in developing algorithms for function prediction from sequence data, image analysis, analysis of mass spectrometry data, workflow management systems. While at Pasteur he developed: KNIME extensions for Next Generation Sequencing (Link) Post Alignment Visualization and Characterization of High-Throughput Sequencing Experiments (Link) Post Alignment statistics of Illumina reads (Link)
AlgorithmicsChIP-seqData managementData VisualizationImage analysisMachine learningSequence analysisDatabaseGenome analysisBiostatisticsProgram developmentScientific computingData and text miningIllumina HiSeqGraphics and Image ProcessingIllumina MiSeqHigh Throughput ScreeningFlow cytometry/cell sortingPac Bio
I am seeking to apply my knowledge in computer science and statistics to understand real world data. I have interdisciplinary background spanning complex systems, Big Data, machine learning, biostatistics and genomics. I have completed a PhD in which I applied clustering and PCA to epigenomics data and discovered new insights on the coupling between replication and epigenetics. I worked at Dataiku, a dynamic start up in which I was actively engaged to help their clients to build their Big Data strategy and draw value from their data. I studied the human microbiota during two years at MetaGenoPolis (MGP), an innovative research center. We aim at improving human health by developing strategies (eg. nutritional, therapeutical, preventive…) to restore dysbiosed microbiota with our industrial and academical partners. I currently work in the statistical genetics group at the Pasteur Institut where I apply my software development and data science skills to quantify the impact of the human genome variation on diverse health parameters.
ClusteringData managementGenomicsGenome analysisExploratory data analysisGeneticsComparative metagenomicsDimensional reductionMultidimensional data analysis
After a PhD in Biology in 2011 on population genetics and phylogeography on amazing little amphipods (Crangonyx, Crymostygius) at the University of Reykjavik (Iceland), I pursued my interest in Bioinformatics and Evolutionary Biology in various post-docs in Spain (MNCN Madrid, UB Barcelona). During this time, I investigated transcriptomic landscapes for various non-model species (groups Conus, Junco and Caecilians) using de novo assemblies and participated in the development of TRUFA, a web platform for de novo RNA-seq analysis. In July 2016, I integrated the Revive Consortium and the Epigenetic Regulation unit at Pasteur Institute, where my main focus were transcriptomic and epigenetic analyses on various thematics using short and long reads technologies, with a special interest in alternative splicing events detection. I joined the Bioinformatics and Biostatistics Hub in January 2018. My latest interests are long reads technologies, alternative splicing and achieving reproducibility in Bioinformatics using workflow managers, container technologies and literate programming.
Data managementData VisualizationSequence analysisTranscriptomicsWeb developmentGenome analysisProgram developmentExploratory data analysisSofware development and engineeringGeneticsEvolutionRead mappingWorkflow and pipeline developmentPopulation geneticsMotifs and patterns detectionGrid and cloud computing
HumanInsect or arthropodOther animalAnopheles gambiae (African malaria mosquito)Mouse
- Build a software to decipher Gephyrin alternative transcripts obtained with long read sequencing(allemand ERIC - Epigenetic Regulation) - Pending
- Transcriptomics of Anopheles – Plasmodium vivax interactions towards identification of malaria transmission blocking targets(Catherine BOURGOUIN - Functional Genetics of Infectious Diseases) - In Progress
- Mapping of Enhancers from transcriptome data(Christian MUCHARDT - Epigenetic Regulation) - In Progress
Data managementMachine learningStatistical inferenceScientific computingExploratory data analysisSofware development and engineeringParallel computingNeuroimaging and computational neuroscienceGrid and cloud computing
After a Master degree in bioinformatics and biostatistics, I did a PhD in computer science / bioinformatics at University Paris-Sud (now in University Paris-Saclay), where I worked on integration and analysis of comparative genomics data. After a postdoc in Lausanne, Switzerland where I worked on small-RNA sequencing data, I joined GenoSplice where I was responsible for the development of bioinformatics projects related to next generation sequencing. I joined Institut Pasteur in Nov. 2015, to work in the Evolutionary Bioinformatics Unit and participate in the development of new tools and algorithms that are able to tackle efficiently the ever increasing amount of sequencing data.
AlgorithmicsData managementPhylogeneticsSequence analysisDatabaseGenome analysisProgram developmentScientific computingDatabases and ontologiesSequencingWorkflow and pipeline development
After a PhD in bioinformatics at Inria/IRISA, Université de Rennes 1, Rennes (France), under the supervision of Dominique Lavenier and Pierre Peterlongo, I did a postdoc in bioinformatics at Laboratory of Ecology and Evolution of Plankton in Stazione Zoologica Anton Dohrn of Naples, Italy. Both my thesis and my postdoc were about the Tara Oceans projet and the development of new software to analyze huge quantities of raw reads coming from metagenomics sample. I am currently occupying a research engineer position at the Hub as leader of ALPS group and focus on several different computing problems including metagenomics, protein assembly and several short term developments.
AlgorithmicsData managementProteomicsDatabaseProgram developmentScientific computingSofware development and engineeringComparative metagenomics
- Analysis of neuronal population dynamics in rodents during virtual navigation(Christoph SCHMIDT-HIEBER - Neural circuits for spatial navigation and memory) - Pending
- Recombination among enteroviruses(Maël BESSAUD - Biology of Enteric Viruses) - Pending
- Identification of new or unexpected pathogens, including viruses, bacteria, fungi and parasites associated with acute or progressive diseases(Marc ELOIT - Biology of Infection) - In Progress
After a PhD in biochemistry of the rapeseed proteins, during which I developed my first automated scripts for handling data processing and analysis, I join Danone research facility center for developing multivariate models for the prediction of milk protein composition using infrared spectrometry.
As I was already developing my own informatics tools, I decided to join the course of informatic for biology of the Institut Pasteur in 2007. At the end of the course I was recruited by the Institute and integrate the unit of “génétique des interactions macromoléculaires” of Alain Jacquier. Within this group, I learn to handle sequencing data and I developed processing and analysis tools using python and R. I also create a genome browser and database system for storing, retrieving and visualizing microarray data. After 8 years within the Alain Jacquier’s lab, I join the Hub of bioinformatics and biostatistics as co-head of the team.
ClusteringData managementSequence analysisTranscriptomicsWeb developmentDatabaseGenome analysisProgram developmentScientific computingExploratory data analysisData and text miningIllumina HiSeqRead mappingLIMSIllumina MiSeqHigh Throughput ScreeningMultidimensional data analysisWorkflow and pipeline developmentRibosome profilingMotifs and patterns detection
- Identification of eukaryotic 5'UTRs(Arnaud ECHARD - Membrane Traffic and Cell Division) - Closed
- Super-resolution imaging and reconstructions of human cell chromosome architecture(Xian HAO - Imaging and Modeling) - In Progress
- Utilize mouse models to study infection by HIV-1(Valentina LIBRI - Center for Translational Science) - Awaiting Publication
After a Master degree in Genome Analysis and Molecular Modeling at Denis Diderot University, I did a PhD in NMR / bioinformatics at Denis Diderot University, where I worked on the development and use of a software named DaDiModO which uses SAXS data and RDC/NMR data to calculate models of structural proteins. After a postdoc aiming to adapt ARIA software to allow execution on computing grid in the Structural Bioinformatic Team at Institut Pasteur in collaboration with IBCP, I joined CIB/DSI Team where I was responsible for the development of bioinformatics projects and the deployment, maintenance and evolution of the Pasteur Galaxy server. I joined the Hub/C3BI team in 2017 as research engineer where I’m involved in several projects such as structural bioinformatics, softwares and web development. I am also in charge of the maintenance of the Galaxy Pasteur instance.
Data managementStructural bioinformaticsDatabaseProgram developmentScientific computingDatabases and ontologiesGrid and cloud computing
- Intégration d'outils bioinformatique dans Galaxy pour identification bactérienne(ANNE LE FLECHE - Department of Infection & Epidemiology,Environment and Infectious Risks) - In Progress
- Identification of APOBEC3 mutations in cancer genoms(Vincent CAVAL - Molecular Retrovirology) - In Progress
- Assembly of insect virus genome(Karin EIGLMEIER - Genetic and Genomics of Insects Vectors) - In Progress
Professional Experience Today - Institut Pasteur,Paris - HUB Team 2017 - Bioinformatician 2001 - 2017 - Institut Pasteur,Paris; CIB/DSI - Engineer 1997 - 2000 Thesis: NMR and molecular modelisation, CEA, Saclay,
Data managementSequence analysisTranscriptomicsGenome analysisProgram developmentScientific computing
FungiCandida albicansCryptococcus gattiiCryptococcus neoformans
- maintenance du site de la Collection des cyanobactéries(Bénédicte BENEDIC - Collection of Cyanobacteria) - Pending
- Trichosporon asahii NGS analysis(Marie DESNOS-OLLIVIER - Molecular Mycology) - In Progress
- Development of a bioinformatics workflow dedicated to the analysis of the viral metagenome: from NGS raw data to the identification of novel viruses(Laurent DACHEUX - Lyssavirus Dynamics and Host Adaptation) - In Progress
Data managementData VisualizationWeb developmentDatabaseProgram developmentDatabases and ontologiesSofware development and engineeringData integrationWorkflow and pipeline development
- crispr.pasteur.fr(David BIKARD - Synthetic Biology) - Closed
- The Flemmingsome: the proteome of intact cytokinetic midbodies(NEETU GUPTA-ROSSI - Membrane Traffic and Cell Division) - Awaiting Publication
- Development of a Software tool to integrate Bottomp-up, Middle-down and Top-down proteomics data(Mariette MATONDO - Proteomics,Structural Mass Spectrometry and Proteomics) - Pending
Graduated in “Structural Genomics and Bioinformatics”, I mainly worked during almost 6 years at the Genoscope (CEA) in the LABGeM team, within the microbial annotation platform MicroScope. I specifically focused on functional annotation and microbial metabolic pathways prediction and reconstruction, through pipeline implementation, database modeling and web interface development. Broadly, interactions in the MicroScope platform allowed me to tackle the whole annotation process: from genome assembly and gene prediction to network reconstruction. I also performed several comparative genomics analyses. As a member of the “Hub team”, I now take part to various projects, linked to HTS data, on different subjects (lncRNAs and stem cells, HIV integration and DNA structure, Ribosomal protein genes and genome evolution, Natural Antisense Transcripts in compact genomes…).
Data managementGenomicsSequence analysisWeb developmentDatabaseGenome analysisDatabases and ontologiesOrthology and paralogy analysisRead mappingSequence homology analysisGene prediction
- Genomic DNA sequencing of Burkholderia ambifaria Q53 strain isolated from peanut rizospheric soil(Mathilde BEN ASSAYA - Structural Microbiology) - In Progress
- Comparative genomics of Helicobacter pylori bismuth resistant strains(Hilde DE REUSE - Helicobacter Pathogenesis) - Pending
- Sequence analysis of Mycobacterium marinum mutants(Mena CIMINO - Department of Genomes and Genetics) - In Progress
Activities Contact for any subject related to IFB. Help scientists to develop new tools (architecture, design, implementation). animate the Python Working Group at pasteur . O|B|F (http://www.open-bio.org/) member. Skills Strong programming experience in Python. Software architecture and design. NoSQL DataBase (MongoDB, CouchDB) XML/YAML continuous integration (github/travis-CI/readthedocs, gitlab/gitlab-CI) containers (Docker, Singularity) linux (Gentoo, Xubuntu) IFB developer Main projects on the campus Mobyle http://Mobyle.pasteur.fr Mobyle: a new full web bioinformatics framework IntegronFinder (ongoing project) MacsyFinder (ongoing project) githubaccess to my projects on github Teaching Unix (Unix-I , Unix-II) Python . Education 2002 Phd in Molecular and cellular biology. “Rôle de deux protéines QN1 et PATF impliquées dans l’arrêt de prolifération des cellules de la neurorétine aviaire au cours du developpement”. 2001 “Informatique En Biologie” course (Pasteur)
Data managementDatabaseProgram developmentScientific computingDatabases and ontologies
- Move of the DISCO-BAC server VM to the new DSI infrastructure(Benno SCHWIKOWSKI - Systems Biology) - Closed
- Genetic and statistical analysis of data produced with the Collaborative Cross at the Institut Pasteur(Xavier MONTAGUTELLI - Mouse Genetics) - In Progress
- MacSyDBCapsule(Eduardo ROCHA - Microbial Evolutionary Genomics) - Closed
Dr. Natalia Pietrosemoli is an Engineer with a M. Sc. in Modeling and Simulation of Complex Realities from the International Center for Theoretical Physics, ICTP and the International School of Advanced Studies, SISSA (Triest, Italy). During her M. Sc. internships she mostly worked in modeling, optimization, combinatorics and information theory applied to medical imaging. In 2012 she got a Ph. D in Computational Biology from the School of Bioengineering of Rice University (Houston, TX, US), where she specialized in computational structural biology and functional genomics. Her doctoral thesis “Protein functional features extracted with from primary sequences : a focus on disordered regions”, contributed to a better understanding of the functional and evolutionary role of intrinsic disorder in protein plasticity, complexity and adaptation to stress conditions. As part of her Ph. D., Natalia was a visiting scholar in two labs in Madrid: the Structural Computational Biology Group at the Spanish National Cancer Research Centre (CNIO), where she mainly worked in sequence analysis and the functional-structural relationships of proteins, and the Computational Systems Biology Group at the Spanish National Centre for Biotechnology (CNB-CSIC ), where she studied the functional implications of intrinsically disordered proteins at the genomic level for several organisms, collaborating with different experimental and theoretical groups. In 2013, she joined the Swiss Institute of Bioinformatics as a postdoctoral fellow in the Bioinformactics Core Facility. Her main project consisted in the molecular classification of a rare type of lymphoma, which involved the integration of transcriptomic, clinical and mutational data for the identification of molecular markers for classification, diagnosis and prognosis. This work was performed in collaboration with the Pathology Institute at the University Hospital of Lausanne (CHUV). In November of 2015 Natalia joined the Hub Team @ Pasteur C3BI as a Senior Bioinformatician. Natalia is especially interested in the integrative analysis of different omics data, both at large-scale and for small datasets, and loves collaborating in interdisciplinary environments and having feedback from her fellow experimental colleagues. Currently, she’s coordinating several projects performing functional and pathway analysis at the genomic level. By grouping genes, proteins and other biological molecules into the pathways they are involved in, the complexity of the analyses is significantly reduced, while the explanatory power increases with respect to having a list of differentially expressed genes or proteins.
AlgorithmicsData managementGenomicsImage analysisMachine learningModelingProteomicsSequence analysisStructural bioinformaticsTranscriptomicsDatabaseGenome analysisBiostatisticsScientific computingDatabases and ontologiesApplication of mathematics in sciencesData and text miningGeneticsGraphics and Image ProcessingBiosensors and biomarkersClinical researchCell biology and developmental biologyInteractomicsBioimage analysis
- Determination of the transcriptome controlled by the two-component system BvrR/BvrS using dominant positive and negative BvrR mutants(Javier PIZARRO-CERDA - Yersinia) - Pending
- Analyse transcriptionnelle du cellules cancéreuse intestinal vs normales après co-culture avec la bactérie associée au cancer Streptococcus gallolyticus(Ewa PASQUEREAU - Biology of Gram-Positive Pathogens) - Pending
- Functional interactomics of SKAP2(Jean-François BUREAU - Functional Genetics of Infectious Diseases) - Pending
Related projects (4)
Innate lymphoid cells (ILCs) are the most recently identified components of the innate immune system. ILCs colonize different tissue sites and react promptly to microenvironmental perturbations. Due to their high plasticity, ILCs can shape their functional output in response to local cues. As such, ILCs play roles under homeostatic conditions and in the context of infection, chronic inflammation, metabolic diseases and cancer. Diverse ILC subsets (NK cells, ILC2) have been shown to regulate the metabolic homeostasis. Metabolic states affect cellular functions and have been shown to play an important role in the regulation of adaptive immunity. In contrast, almost nothing is known about innate lymphocytes metabolism and the importance of energy regulation for ILC function. This project will study metabolic profiles in human ILC subsets under diverse environmental conditions. Enhancing or interfering with ILC activity could ultimately represent a novel useful therapy for chronic inflammatory diseases.
Insight into the Immune System: A bioresource and data-sharing platform to study chronic inflammatory diseases (IsIShare)
Chronic inflammatory systemic diseases (CIDs) are a burden to humans because of life-long debilitating illness, increased mortality and high therapy costs. CIDs’ increasing prevalence in western countries has indeed placed them at the third rank of morbi-mortality causes. Unfortunately, available treatments are poorly targeted and non-curative. That is partly linked to a complex and largely ununderstood pathophysiology. Genetic susceptibility clearly plays a role. Genes linked to the immune system have been identified, but causal genes remain mostly unknown and other factors such as intestinal microbiota have also been implicated. The complexity of CIDs’ pathophysiology suggests that a holistic approach is the most susceptible to help make significant progress. Our project intends to take advantage of recent technical progress and development of informatics tools to set up a transversal approach. High-resolution sequencing technology indeed quickly produces large amounts of accurate data. Besides, new integrative informatics tools allowing storage and integrative analysis of this resulting high amount of data are now available. We intend to set-up a CID’s network allowing the gathering and extensive analysis of data related to immuno-genetic determinants, immune repertoire and microbiota from individuals suffering from one of the three major interlinked CIDs, namely Hidradenitis Suppurativa (HS), Crohn’s disease (CD) and Spondyloarthropathy (SpA) as compared to healthy volunteers.
We want to automatize the merging of CSV document for data analysis purposes. We've already discussed it with Gael Millot.
Bacteriophages infect bacteria. One bacteriophage can infect several strains. Around the world, many labs have performed spot tests to determine the host range of bacteriophages but this information i