Expertise

Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise

Search by keywords | Search by organisms

Searched keyword : Data Visualization

Related people (11)

Freddy CLIQUET


One of my projects consists in developing GRAVITY, a java tool based on Cytoscape to integrate genetic variants within protein-protein interaction networks to allow the visual and statistical interpretation of next-generation sequencing data, ultimately helping geneticists and clinicians to identify causal variants and better diagnose their patients. I’m also involved in several other projects in the lab, taking part in the design of pipelines for the processing and the analysis of genomics data, including SNP arrays, whole-exome and whole-genome sequencing data. This means being confronted to the big data problematic, the unit having to manage hundreds of terabytes of genomics data. Finally, I am now analysing these data in order to identify possible causes for autism, to help clinicians with their diagnosis but also to better understand the biological mechanisms at play in this complex disease. This is done through the project aiming at understanding the genetic architecture of autism in the Faroe Islands, and also with the newly starting IMI2 European project AIMS2-Trials.


Keywords
AlgorithmicsData managementData VisualizationGenomicsMachine learningProteomicsGenome analysisBiostatisticsProgram developmentScientific computingApplication of mathematics in sciencesExploratory data analysisSofware development and engineeringData and text miningGenetics
Organisms

Projects (0)

    Thomas COKELAER

    Group : PLATEFORM - Detached : Biomics

    I joined the Bioinformatics and Biostatistics Hub at Institut Pasteur in 2016 where I am currently developing pipelines related to NGS for the Biomics Pôle. I have an interdisciplinary research experience: after a PhD in Astronomy (gravitational wave data analysis), I joined several research institute to work in the fields of plant modelling (INRIA, Montpellier, 2008-2011), System Biology — in particular logical modelling (EMBL-EBI Cambridge, U.K., 2011-2015), and drug discovery (Sanger Institute, Cambridge, U.K.), 2015). On a daily basis, I use data analysis and machine learning techniques within high-quality software to tackle scientific problems.


    Keywords
    AlgorithmicsData managementData VisualizationGenome assemblyGenomicsMachine learningModelingScientific computingDatabases and ontologiesSofware development and engineeringData and text miningIllumina HiSeqGraph theory and analysisIllumina MiSeq
    Organisms

    Projects (2)

    Bernd JAGLA

    Group : PLATEFORM - Detached : Biomarker Discovery

    Bernd Jagla received his PhD in bioinformatics (department of Biology, Chemistry, and Parmacy) from the Free University in Berlin, Germany in 1999. Before joining the Institut Pasteur, he worked for almost ten years in New York City, including as an associate research scientist in the Joint Centers for System Biology (Columbia University) and at the Columbia University Screening Center led by Dr J.E. Rothman. He joined the Institut Pasteur in 2009 to take charge of the bioinformatic needs at the Transcriptome et Epigenome platform, focusing on Next Generation Sequencing. As of 2016 he is member of the C3BI – HUB Team detached to the Human immunology center (CIH) and provides support for cytometry, next generation sequencing, and microarray data analysis. His areas of interest include the quality assurance and data analysis and visualization at the facility. He also has strong expertise in developing algorithms for function prediction from sequence data, image analysis, analysis of mass spectrometry data, workflow management systems. While at Pasteur he developed: KNIME extensions for Next Generation Sequencing (Link) Post Alignment Visualization and Characterization of High-Throughput Sequencing Experiments (Link) Post Alignment statistics of Illumina reads (Link)


    Keywords
    AlgorithmicsChIP-seqData managementData VisualizationImage analysisMachine learningSequence analysisDatabaseGenome analysisBiostatisticsProgram developmentScientific computingData and text miningIllumina HiSeqGraphics and Image ProcessingIllumina MiSeqHigh Throughput ScreeningFlow cytometry/cell sortingPac Bio
    Organisms

    Projects (2)

    Etienne KORNOBIS

    Group : PLATEFORM - Detached : Epigenetic regulation

    After a PhD in Biology in 2011 on population genetics and phylogeography on amazing little amphipods (Crangonyx, Crymostygius) at the University of Reykjavik (Iceland), I pursued my interest in Bioinformatics and Evolutionary Biology in various post-docs in Spain (MNCN Madrid, UB Barcelona). During this time, I investigated transcriptomic landscapes for various non-model species (groups Conus, Junco and Caecilians) using de novo assemblies and participated in the development of TRUFA, a web platform for de novo RNA-seq analysis. In July 2016, I integrated the Revive Consortium and the Epigenetic Regulation unit at Pasteur Institute, where my main focus were transcriptomic and epigenetic analyses on various thematics using short and long reads technologies, with a special interest in alternative splicing events detection. I joined the Bioinformatics and Biostatistics Hub in January 2018. My latest interests are long reads technologies, alternative splicing and achieving reproducibility in Bioinformatics using workflow managers, container technologies and literate programming.


    Keywords
    Data managementData VisualizationSequence analysisTranscriptomicsWeb developmentGenome analysisProgram developmentExploratory data analysisSofware development and engineeringGeneticsEvolutionRead mappingWorkflow and pipeline developmentPopulation geneticsMotifs and patterns detectionGrid and cloud computing
    Organisms
    HumanInsect or arthropodOther animalAnopheles gambiae (African malaria mosquito)Mouse
    Projects (3)

    Pierre LECHAT

    Group : ALPS - Hub Core

    I have been involved in genomic projects for prokaryotic and human genetic studies (GWAS) since 1998. Currently, I am working on novel visualization techniques to explore large and highly complex data sets. I have develop a web based graphical user interface, called SynTView (http://genopole.pasteur.fr/SynTView/) to visualize biological features in comparative genomic studies. The tool allows interactive visualization of microbial genomes to investigate massive amounts of information efficiently. The software is characterized by the presentation of synthetic organisations of microbial genomes and the visualization of polymorphism data. I am extending this work into designing novel dynamic views for comparative analysis of viruses in emerging disease.


    Keywords
    Data VisualizationDatabaseSofware development and engineeringComparative metagenomicsOrthology and paralogy analysis
    Organisms

    Projects (26)

    Rachel TORCHET

    Group : WINTER - Hub Core

    In 2012 I completed my master degree at the MicroScope Platform located at Genoscope (the French National Sequencing Center). I was involved in a project aiming at the management of evolution projects which rely on the Next Generation Sequencing (NGS) technologies to try to decipher the dynamics of genomic changes as well as the molecular bases and the mechanisms underlying adaptative evolution of micro-organisms (Remigi et al. 2014). Since November 2014, I joined the Bioinformatics and Biostatistics HUB at Institut Pasteur. I participated to the creation and updates of the C3BI website. I joined the WINTER group where I’m in charge of web and interface development projects. I have completed an UX-Design training to add extra value to my front-end development skills. I design and develop bioinformatics tools and interfaces that are users oriented.


    Keywords
    Data VisualizationWeb developmentDatabaseGenome analysisScientific computingDatabases and ontologiesSofware development and engineeringWorkflow and pipeline development
    Organisms

    Projects (12)

    Johann DRÉO


    As a senior research engineer, I have explored many corners of computer science and artificial intelligence. I can most notably help you on the following topics. Skills Decision support systems Boxes and arrows design and implementation of decision-aid software (web-based as well as native interfaces and backends), visualization and diagrams (how to summarize complex data/concepts in a visual way), integration of third-party modules (how to design API to use external services, how to integrate software that does not really want to be integrated). Automated decision A black-box with a black-box inside score function modelling (how to design a metric defining a quality for a solution to a decision problem, while maintaining good mathematical properties), optimization problem modelling (how to design a formal model of a decision problem to be automatically solved by a computer), solving automated configuration problems (how to set parameters of a complex system so as to maximize its performances), Scientific computing Lego blocks and arrows efficient algorithmics (how to cope with combinatorial explosion or curse of dimension when implementing complex algorithms), highly modular software architectures (how to structure your code to allow efficient —and automated— exploration of your ideas), modern C++ (how to program with C++ using —almost— the same concepts than in Python), shell scripting (how to use the existing Unix tools to —very— efficiently automatize any task). Artificial Intelligence search heuristics, metaheuristics or evolutionary computation (how to solve hard optimization problems), design of experiments for randomized algorithmics (how to design experiments involving modern AI, using rigorous statistics), automated planning (how to compute shortest paths, and more generally optimize sequences of actions), semantic graph mining (how to find patterns in an ontology).


    Keywords
    AlgorithmicsData Visualization
    Organisms
    Non applicable
    Projects (0)

      Related projects (32)

      Phylogenetic analysis of the Leishmania HSP70 protein family



      Project status : Closed

      Influence of chromatin dynamics on genomic stability during replication

      Genomic DNA is hierarchically packed within the living cells and genome duplication requires the concerted effort of many thousands of individual replication units. As such, to ensure the integrity of transmission of the genetic information, both eukaryotes and prokaryotes have evolved sophisticated mechanisms to monitor DNA replication. Some of these mechanisms aim to maintain both a temporal and a spatial organization of the replication program, leading to multiple replication time regions and the compartmentalization into replication foci, subnuclear sites which accumulate numerous DNA replication factors. It should be noted that Saccharomyces cerevisiae represents an exception to the standard eukaryotic strategy for genome duplication. Similar to bacteria, S. cerevisiae possess well-defined replication origin sequences that can fire at a very efficient rate during S phase, leading to a very homogenous pattern of DNA replication. A common mo del suggests that, once replication starts dynamic events take place since co-regulated replication forks, having similar replication timing, cluster within a discrete number of foci that show distinct patterns of nuclear localization over the S-phase. Once initiated, the DNA synthesis might be compromised if the replication fork encounters an RFB (Replication Fork Barrier) such as DNA lesions, tightly bound protein-DNA complexes etc. The RFBs are considered a potential source of genetic instability and may lead to many chromosomal rearrangements. As a consequence, eukaryotes employ a complex DNA damage response against RFBs, which aims to maintain the stability of the stalled forks and provides the time required to repair and resume replication. Recent observations suggest that the non-random organization of the nucleus affects where repair occurs. The aim of this project is to reach a better understanding of the influence of the nuclear spatial architecture and organization at replication fork blocks.



      Project status : Closed

      MicrocystOmics

      Les cyanobactéries sont des microorganismes qui prolifèrent dans de nombreux plans d’eau et perturbent leurs fonctionnements et leurs usages car elles sont capables de produire des toxines dangereuses pour la santé humaine et animale. Si la réglementation sanitaire est basée, pour l’instant, sur la surveillance d’une seule toxine, il est désormais connu que ces microorganismes sont capables d’en synthétiser un grand nombre qu’il conviendrait de mieux prendre en compte dans le futur. C’est pourquoi, dans le but de mieux connaître le potentiel toxique des cyanobactéries, ma thèse s'applique, par des études sur leur génome et par une approche de chimie, à caractériser les gènes impliqués dans la synthèse de ces métabolites ainsi que les métabolites produits par ces gènes, à déterminer sur des souches de culture et dans des échantillons naturels provenant de plans d’eau d’Ile de France quel est le potentiel de production de ces métabolites et à mieux comprendre les facteurs environnementaux qui favorisent cette production. Deux équipes de Paris (Pasteur et iEES) sont associées sur ce travail qui implique également des collaborations étrangères. S'il est désormais bien connu qu'une part importante du métabolisme des cyanobactéries qui sont des microorganismes photosynthétiques, est régulée en fonction des phases de lumière et d'obscurité, les connaissances disponibles sur la synthèse des métabolites secondaires sont en revanche beaucoup plus limitées. Ces métabolites ont pourtant un double intérêt puisque certains sont toxiques pour l'Homme alors que d'autres ont un intérêt pharmaceutique potentiel. Leur synthèse repose sur l'expression de clusters de gènes pouvant être de très grande taille (jusqu’à 100 kb par région).



      Project status : Closed

      Listeriomics - Development of a web platform for visualization and analysis of Listeria omics data

      Over the past three decades Listeria has become a model organism for host-pathogen interactions, leading to critical discoveries in a broad range of fields including virulence-factor regulation, cell biology, and bacterial pathophysiology. More recently, the number of Listeria “omics” data produced has increased exponentially, not only in term of number, but also in term of heterogeneity of data. There are now more than 40 published Listeria genomes, around 400 different transcriptomics data and 10 proteomics studies available. The capacity to analyze these data through a systems biology approach and generate tools for biologists to analyze these data themselves is a challenge for bioinformaticians. To tackle these challenges we are developing a web-based platform named Listeriomics which integrates different type of tools for “omics” data manipulation, the two most important being: 1) a genome viewer for displaying gene expression array, tiling array, and RNASeq data along with proteomics and genomics data. 2) An expression atlas, which is a query based tool which connects every genomics elements (genes, smallRNAs, antisenseRNAs) to the most relevant “omics” data. Our platform integrates already all genomics, and transcriptomics data ever published on Listeria and will thus allow biologists to analyze dynamically all these data, and bioinformaticians to have a central database for network analysis. Finally, it has been used already several times in our laboratory for different types of studies, including transcriptomics analysis in different biological conditions, and whole genome analysis of Listeria proteins N-termini. This project is funded by an ANR Investissement d'avenir: BACNET  10-BINF-02-01



      Project status : Closed

      Comparative analysis of the virulence plasmids of Shigella Spp. and entero-invasive Escherichia coli

      Context. Bacteria of the genus Shigella and strains of entero-invasive Escherichia coli (EIEC) are responsible of bacillary dysentery (shigellosis) in humans. Although (very) closely related to E. coli, the genus Shigella is divided in four "species": S. boydii, S. dysenteriae, S. flexneri and S. sonnei. Most virulence determinants enabling these bacteria to enter into and disseminate within epithelial cells are encoded by a 200-kb virulence plasmid (VP). The first complete sequence of a VP (pWR100 from a S. flexneri strain of serotype 5a) was determined by our laboratory in 2000. The VP contains genes of different origins, as attested by their G+C content ranging from 30 to 60%, traces of four plasmids and a large numbers of various insertions sequences (IS) representing 30-40% of the total sequence (Buchrieser et al., 2000). In addition to IS sequences, the VP carries members of several multigene families (exhibiting over 90% identity). Such repeated sequences are potentially prone to recombination (allelic exchange, gene conversion) and deletion. Based on the analysis of three genes carried by the VP, it has been proposed that, depending of the species / phylogenetic group, there are two forms of the VP (pInvA & pInvB) that were acquired independently in different original E. coli strains. General questions. What are the architectures of the VP from different phylogenetic groups and how different are pInvA and pInvB ? Which genes are conserved in all VP and which genes are unique to some VP ? Did recombinations occur and, if so, where and when ? To answer these questions, a comparative analysis of the genetic organization and gene conservation among the VP from different phylogenetic groups of Shigella/EIEC has been undertaken using the available complete (or presented as such) sequences of 15 VP, including three members for each of five phylogenetic groups (S. boydii, S. dysenteriae 1, S. flexneri, S. sonnei and EIEC).



      Project status : Closed

      JASS: an online tool for the joint analysis of GWAS summary statistics

      In recent years, large genome-wide association studies (GWAS) have been successful in identifying thousands of significant genetic associations for multiple traits and diseases1. In the course of this endeavor, sample size has proven to be the key factor for identifying new variants. For example, GWAS of body mass index (BMI), now including up to 350,000 individuals from more than 100 cohorts, have been able to identify genetic variant that explain as low as 0.02% of BMI variance2. While standard approaches for detecting new genetic variants associated with traits and diseases will go on as sample size increases, multivariate analyses have been proposed as an alternative strategy for both improving detection of new variants and exploring the multidimensional components of complex traits and diseases. Intuitively, multivariate analysis can be used to improve detection of variants displaying a pleiotropic effect3 by accumulating moderate evidence of association across multiple traits and diseases. Several recent examples have been published about not only GWAS hit overlap across related traits4, but also of genome-wide shared genetic effect5. Multivariate analyses of GWAS have also proven useful to understand shared genetics between diseases5, and potential causal relationship between phenotypes using Mendelian randomization (MR)6. Importantly, most of existing multivariate methods are based on GWAS summary statistics, while approaches based on individual-level data have been seldom considered because of major practical and ethical issues. In the continuity of ongoing work on multi-phenotype analysis (Aschard et al 20147, Aschard et al 20158), we developed an effective and robust multivariate approach of GWAS summary statistics that addresses the major barriers of existing approaches, i.e. the presence of correlation between studies that would exists when GWAS analyzed share sample9-16. Our approach consists in a robust omnibus multivariate test of GWAS summary statis



      Project status : Closed

      Coaching in R

      Background : The Immunoregulation Unit is composed at present of: Lab Head, one senior staff scientists, two “ingegnieurs” (one IP, one Fondation APHP), two PhD students. The focus of the research is the study of immuno-mechanisms in the pathogenesis of chronic inflammatory diseases, and of the molecular basis of response to treatment. All members of the lab have expressed an interest in acquiring basic R skills for the analysis of large gene expression data sets, collected from the study of patients’ samples. Only one PhD student in the lab is currently using R tools for data analysis, the other members have all received some previous instructions in biostatistics and/or R, but have not been using R tools currently. Requirements : Lab members have expressed a specific need to be instructed in the following areas: 1.refresh basic notions of R language, with a particular focus on handling large gene expression data sets 2.introduction to graphic tools for data representation (ggplot) 3.introduction to tools for differential gene expression analysis (limma) 4.data exploration using Principal Component Analysis Constraints : 1. For the course, it has been decided to use data generated by the lab. Placing the course in the lab’s data context has the advantage of ensuring that the course content is adapted to the “real-life” situations that lab members face in the analysis of their own data. 2.The present situation of “confinement” due to the Covid19 pandemic offers some opportunities (time availability of all members to follow the instruction), but obvious restriction. To maintain interactivity, classes are held at a distance, using Skype group meetings.



      Project status : In Progress

      Genetic diversity of yellow fever virus populations in mosquitoes

      Yellow fever (YF) is a fatal hemorrhagic disease caused by an arbovirus, Yellow Fever Virus (YFV) transmitted by the Aedes aegypti mosquito vector. YFV is currently endemic in Africa (5 strains) and South America (2 strains), and although an effective vaccine is available, YFV remains a major public health issue. Native from Africa, YFV was transported to South America and the Caribbean during the slave trade (15th to 19th century) and caused devastating outbreaks. After two centuries of outbreaks, YF is no longer reported in the Caribbean thanks to the successful mosquito control program implemented at the beginning of the 20th century. The virus disappeared from the Caribbean and Martinique experienced its last outbreak in 1908. However, the relaxation of anti-vectorial control programs contributed to the reintroduction of Ae. aegypti in most Caribbean islands. We evaluated the vector competence to YFV of different populations of Ae. aegypti from Martinique and other mosquito populations of the Caribbean. Our results showed that Ae. aegypti from Martinique were competent to African as well as to American YFV genotypes with however differences of viral dissemination and transmission. In this project, we aim at determining the diversity of viral populations at the crossing of two anatomical barriers in mosquitoes: midgut and salivary glands. The virus should enter into the midgut epithelial cells, replicate and be released into the hemocel. The final step will be the infection of the salivary glands from where the virus is excreted with the saliva. At each step, a selection of viral populations operates with only a fraction of the viral population transmitted between different tissues within a same host. The virus diversification promoted by virus replication in mosquitoes depends on the mosquito species. Such vector-specific conditions may have great impacts on emergence processes.



      Project status : Pending