Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise

Search by keywords | Search by organisms

Searched keyword : Data Visualization

Related people (10)


One of my projects consists in developing GRAVITY, a java tool based on Cytoscape to integrate genetic variants within protein-protein interaction networks to allow the visual and statistical interpretation of next-generation sequencing data, ultimately helping geneticists and clinicians to identify causal variants and better diagnose their patients. I’m also involved in several other projects in the lab, taking part in the design of pipelines for the processing and the analysis of genomics data, including SNP arrays, whole-exome and whole-genome sequencing data. This means being confronted to the big data problematic, the unit having to manage hundreds of terabytes of genomics data. Finally, I am now analysing these data in order to identify possible causes for autism, to help clinicians with their diagnosis but also to better understand the biological mechanisms at play in this complex disease. This is done through the project aiming at understanding the genetic architecture of autism in the Faroe Islands, and also with the newly starting IMI2 European project AIMS2-Trials.

AlgorithmicsData managementData VisualizationGenomicsMachine learningProteomicsGenome analysisBiostatisticsProgram developmentScientific computingApplication of mathematics in sciencesExploratory data analysisSofware development and engineeringData and text miningGenetics

Projects (0)

    Thomas COKELAER

    Group : DETACHED - Detached : Biomics

    I joined the Bioinformatics and Biostatistics Hub at Institut Pasteur in 2016 where I am currently developing pipelines related to NGS for the Biomics Pôle. I have an interdisciplinary research experience: after a PhD in Astronomy (gravitational wave data analysis), I joined several research institute to work in the fields of plant modelling (INRIA, Montpellier, 2008-2011), System Biology — in particular logical modelling (EMBL-EBI Cambridge, U.K., 2011-2015), and drug discovery (Sanger Institute, Cambridge, U.K.), 2015). On a daily basis, I use data analysis and machine learning techniques within high-quality software to tackle scientific problems.

    AlgorithmicsData managementData VisualizationGenome assemblyGenomicsMachine learningModelingScientific computingDatabases and ontologiesSofware development and engineeringData and text miningIllumina HiSeqGraph theory and analysisIllumina MiSeq

    Projects (2)

    Bernd JAGLA

    Group : PLATEFORM - Detached : Biomarker Discovery

    Bernd Jagla received his PhD in bioinformatics (department of Biology, Chemistry, and Parmacy) from the Free University in Berlin, Germany in 1999. Before joining the Institut Pasteur, he worked for almost ten years in New York City, including as an associate research scientist in the Joint Centers for System Biology (Columbia University) and at the Columbia University Screening Center led by Dr J.E. Rothman. He joined the Institut Pasteur in 2009 to take charge of the bioinformatic needs at the Transcriptome et Epigenome platform, focusing on Next Generation Sequencing. As of 2016 he is member of the C3BI – HUB Team detached to the Human immunology center (CIH) and provides support for cytometry, next generation sequencing, and microarray data analysis. His areas of interest include the quality assurance and data analysis and visualization at the facility. He also has strong expertise in developing algorithms for function prediction from sequence data, image analysis, analysis of mass spectrometry data, workflow management systems. While at Pasteur he developed: KNIME extensions for Next Generation Sequencing (Link) Post Alignment Visualization and Characterization of High-Throughput Sequencing Experiments (Link) Post Alignment statistics of Illumina reads (Link)

    AlgorithmicsChIP-seqData managementData VisualizationImage analysisMachine learningSequence analysisDatabaseGenome analysisBiostatisticsProgram developmentScientific computingData and text miningIllumina HiSeqGraphics and Image ProcessingIllumina MiSeqHigh Throughput ScreeningFlow cytometry/cell sortingPac Bio

    Projects (1)

    Etienne KORNOBIS

    Group : GORE - Embedded : Epigenetic regulation

    After a PhD in Biology in 2011 on population genetics and phylogeography on amazing little amphipods (Crangonyx, Crymostygius) at the University of Reykjavik (Iceland), I pursued my interest in Bioinformatics and Evolutionary Biology in various post-docs in Spain (MNCN Madrid, UB Barcelona). During this time, I investigated transcriptomic landscapes for various non-model species (groups Conus, Junco and Caecilians) using de novo assemblies and participated in the development of TRUFA, a web platform for de novo RNA-seq analysis. In July 2016, I integrated the Revive Consortium and the Epigenetic Regulation unit at Pasteur Institute, where my main focus were transcriptomic and epigenetic analyses on various thematics using short and long reads technologies, with a special interest in alternative splicing events detection. I joined the Bioinformatics and Biostatistics Hub in January 2018. My latest interests are long reads technologies, alternative splicing and achieving reproducibility in Bioinformatics using workflow managers, container technologies and literate programming.

    Data managementData VisualizationSequence analysisTranscriptomicsWeb developmentGenome analysisProgram developmentExploratory data analysisSofware development and engineeringGeneticsEvolutionRead mappingWorkflow and pipeline developmentPopulation geneticsMotifs and patterns detectionGrid and cloud computing
    HumanInsect or arthropodOther animalAnopheles gambiae (African malaria mosquito)Mouse
    Projects (3)

    Pierre LECHAT

    Group : ALPS - Hub Core

    I have been involved in genomic projects for prokaryotic and human genetic studies (GWAS) since 1998. Currently, I am working on novel visualization techniques to explore large and highly complex data sets. I have develop a web based graphical user interface, called SynTView ( to visualize biological features in comparative genomic studies. The tool allows interactive visualization of microbial genomes to investigate massive amounts of information efficiently. The software is characterized by the presentation of synthetic organisations of microbial genomes and the visualization of polymorphism data. I am extending this work into designing novel dynamic views for comparative analysis of viruses in emerging disease.

    Data VisualizationDatabaseSofware development and engineeringComparative metagenomicsOrthology and paralogy analysis

    Projects (20)

    Rachel TORCHET

    Group : WINTER - Hub Core

    In 2012 I completed my master degree at the MicroScope Platform located at Genoscope (the French National Sequencing Center). I was involved in a project aiming at the management of evolution projects which rely on the Next Generation Sequencing (NGS) technologies to try to decipher the dynamics of genomic changes as well as the molecular bases and the mechanisms underlying adaptative evolution of micro-organisms (Remigi et al. 2014). Since November 2014, I joined the Bioinformatics and Biostatistics HUB at Institut Pasteur. I participated to the creation and updates of the C3BI website. I joined the WINTER group where I’m in charge of web and interface development projects. I have completed an UX-Design training to add extra value to my front-end development skills. I design and develop bioinformatics tools and interfaces that are users oriented.

    Data VisualizationWeb developmentDatabaseGenome analysisScientific computingDatabases and ontologiesSofware development and engineeringWorkflow and pipeline development

    Projects (8)

    Related projects (29)

    Phylogenetic analysis of the Leishmania HSP70 protein family

    Project status : Closed

    Influence of chromatin dynamics on genomic stability during replication

    Genomic DNA is hierarchically packed within the living cells and genome duplication requires the concerted effort of many thousands of individual replication units. As such, to ensure the integrity of transmission of the genetic information, both eukaryotes and prokaryotes have evolved sophisticated mechanisms to monitor DNA replication. Some of these mechanisms aim to maintain both a temporal and a spatial organization of the replication program, leading to multiple replication time regions and the compartmentalization into replication foci, subnuclear sites which accumulate numerous DNA replication factors. It should be noted that Saccharomyces cerevisiae represents an exception to the standard eukaryotic strategy for genome duplication. Similar to bacteria, S. cerevisiae possess well-defined replication origin sequences that can fire at a very efficient rate during S phase, leading to a very homogenous pattern of DNA replication. A common mo del suggests that, once replication starts dynamic events take place since co-regulated replication forks, having similar replication timing, cluster within a discrete number of foci that show distinct patterns of nuclear localization over the S-phase. Once initiated, the DNA synthesis might be compromised if the replication fork encounters an RFB (Replication Fork Barrier) such as DNA lesions, tightly bound protein-DNA complexes etc. The RFBs are considered a potential source of genetic instability and may lead to many chromosomal rearrangements. As a consequence, eukaryotes employ a complex DNA damage response against RFBs, which aims to maintain the stability of the stalled forks and provides the time required to repair and resume replication. Recent observations suggest that the non-random organization of the nucleus affects where repair occurs. The aim of this project is to reach a better understanding of the influence of the nuclear spatial architecture and organization at replication fork blocks.

    Project status : Closed


    Les cyanobactéries sont des microorganismes qui prolifèrent dans de nombreux plans d’eau et perturbent leurs fonctionnements et leurs usages car elles sont capables de produire des toxines dangereuses pour la santé humaine et animale. Si la réglementation sanitaire est basée, pour l’instant, sur la surveillance d’une seule toxine, il est désormais connu que ces microorganismes sont capables d’en synthétiser un grand nombre qu’il conviendrait de mieux prendre en compte dans le futur. C’est pourquoi, dans le but de mieux connaître le potentiel toxique des cyanobactéries, ma thèse s'applique, par des études sur leur génome et par une approche de chimie, à caractériser les gènes impliqués dans la synthèse de ces métabolites ainsi que les métabolites produits par ces gènes, à déterminer sur des souches de culture et dans des échantillons naturels provenant de plans d’eau d’Ile de France quel est le potentiel de production de ces métabolites et à mieux comprendre les facteurs environnementaux qui favorisent cette production. Deux équipes de Paris (Pasteur et iEES) sont associées sur ce travail qui implique également des collaborations étrangères. S'il est désormais bien connu qu'une part importante du métabolisme des cyanobactéries qui sont des microorganismes photosynthétiques, est régulée en fonction des phases de lumière et d'obscurité, les connaissances disponibles sur la synthèse des métabolites secondaires sont en revanche beaucoup plus limitées. Ces métabolites ont pourtant un double intérêt puisque certains sont toxiques pour l'Homme alors que d'autres ont un intérêt pharmaceutique potentiel. Leur synthèse repose sur l'expression de clusters de gènes pouvant être de très grande taille (jusqu’à 100 kb par région).

    Project status : Closed

    Listeriomics - Development of a web platform for visualization and analysis of Listeria omics data

    Over the past three decades Listeria has become a model organism for host-pathogen interactions, leading to critical discoveries in a broad range of fields including virulence-factor regulation, cell biology, and bacterial pathophysiology. More recently, the number of Listeria “omics” data produced has increased exponentially, not only in term of number, but also in term of heterogeneity of data. There are now more than 40 published Listeria genomes, around 400 different transcriptomics data and 10 proteomics studies available. The capacity to analyze these data through a systems biology approach and generate tools for biologists to analyze these data themselves is a challenge for bioinformaticians. To tackle these challenges we are developing a web-based platform named Listeriomics which integrates different type of tools for “omics” data manipulation, the two most important being: 1) a genome viewer for displaying gene expression array, tiling array, and RNASeq data along with proteomics and genomics data. 2) An expression atlas, which is a query based tool which connects every genomics elements (genes, smallRNAs, antisenseRNAs) to the most relevant “omics” data. Our platform integrates already all genomics, and transcriptomics data ever published on Listeria and will thus allow biologists to analyze dynamically all these data, and bioinformaticians to have a central database for network analysis. Finally, it has been used already several times in our laboratory for different types of studies, including transcriptomics analysis in different biological conditions, and whole genome analysis of Listeria proteins N-termini. This project is funded by an ANR Investissement d'avenir: BACNET  10-BINF-02-01

    Project status : Closed

    Comparative analysis of the virulence plasmids of Shigella Spp. and entero-invasive Escherichia coli

    Context. Bacteria of the genus Shigella and strains of entero-invasive Escherichia coli (EIEC) are responsible of bacillary dysentery (shigellosis) in humans. Although (very) closely related to E. coli, the genus Shigella is divided in four "species": S. boydii, S. dysenteriae, S. flexneri and S. sonnei. Most virulence determinants enabling these bacteria to enter into and disseminate within epithelial cells are encoded by a 200-kb virulence plasmid (VP). The first complete sequence of a VP (pWR100 from a S. flexneri strain of serotype 5a) was determined by our laboratory in 2000. The VP contains genes of different origins, as attested by their G+C content ranging from 30 to 60%, traces of four plasmids and a large numbers of various insertions sequences (IS) representing 30-40% of the total sequence (Buchrieser et al., 2000). In addition to IS sequences, the VP carries members of several multigene families (exhibiting over 90% identity). Such repeated sequences are potentially prone to recombination (allelic exchange, gene conversion) and deletion. Based on the analysis of three genes carried by the VP, it has been proposed that, depending of the species / phylogenetic group, there are two forms of the VP (pInvA & pInvB) that were acquired independently in different original E. coli strains. General questions. What are the architectures of the VP from different phylogenetic groups and how different are pInvA and pInvB ? Which genes are conserved in all VP and which genes are unique to some VP ? Did recombinations occur and, if so, where and when ? To answer these questions, a comparative analysis of the genetic organization and gene conservation among the VP from different phylogenetic groups of Shigella/EIEC has been undertaken using the available complete (or presented as such) sequences of 15 VP, including three members for each of five phylogenetic groups (S. boydii, S. dysenteriae 1, S. flexneri, S. sonnei and EIEC).

    Project status : Closed

    JASS: an online tool for the joint analysis of GWAS summary statistics

    In recent years, large genome-wide association studies (GWAS) have been successful in identifying thousands of significant genetic associations for multiple traits and diseases1. In the course of this endeavor, sample size has proven to be the key factor for identifying new variants. For example, GWAS of body mass index (BMI), now including up to 350,000 individuals from more than 100 cohorts, have been able to identify genetic variant that explain as low as 0.02% of BMI variance2. While standard approaches for detecting new genetic variants associated with traits and diseases will go on as sample size increases, multivariate analyses have been proposed as an alternative strategy for both improving detection of new variants and exploring the multidimensional components of complex traits and diseases. Intuitively, multivariate analysis can be used to improve detection of variants displaying a pleiotropic effect3 by accumulating moderate evidence of association across multiple traits and diseases. Several recent examples have been published about not only GWAS hit overlap across related traits4, but also of genome-wide shared genetic effect5. Multivariate analyses of GWAS have also proven useful to understand shared genetics between diseases5, and potential causal relationship between phenotypes using Mendelian randomization (MR)6. Importantly, most of existing multivariate methods are based on GWAS summary statistics, while approaches based on individual-level data have been seldom considered because of major practical and ethical issues. In the continuity of ongoing work on multi-phenotype analysis (Aschard et al 20147, Aschard et al 20158), we developed an effective and robust multivariate approach of GWAS summary statistics that addresses the major barriers of existing approaches, i.e. the presence of correlation between studies that would exists when GWAS analyzed share sample9-16. Our approach consists in a robust omnibus multivariate test of GWAS summary statis

    Project status : Closed