Expertise

Hub members Have many expertise, covering most of the fields in bioinformatics and biostatistics. You'll find below a non-exhaustive list of these expertise

Search by keywords | Search by organisms

Searched keyword : Statistics

Related people (15)

Christophe BÉCAVIN

Group : GORE - Hub Core

CV Senior Bioinformatician August 2015 – Present : Institut Pasteur, Paris PostDoc fellow 2011 – 2015 : Pascale Cossart’s laboratory, Unité des Interactions Bactéries-Cellules, Institut Pasteur, Paris Phd fellow 2007 – 2010 : Institut des Hautes Etudes Scientifiques, ann Ecole Normale Supérieure, Paris Magister of Science, Theoretical Physics 2003 – 2007 : Dynamical systems and statistics of complex matter, Université Paris 7 and Université Paris 6


Keywords
BiophysicsMachine learningModelingProteomicsBiostatisticsDatabases and ontologiesHost-pathogen interactions
Organisms
ListeriaLeishmania
Projects (12)

Freddy CLIQUET


One of my projects consists in developing GRAVITY, a java tool based on Cytoscape to integrate genetic variants within protein-protein interaction networks to allow the visual and statistical interpretation of next-generation sequencing data, ultimately helping geneticists and clinicians to identify causal variants and better diagnose their patients. I’m also involved in several other projects in the lab, taking part in the design of pipelines for the processing and the analysis of genomics data, including SNP arrays, whole-exome and whole-genome sequencing data. This means being confronted to the big data problematic, the unit having to manage hundreds of terabytes of genomics data. Finally, I am now analysing these data in order to identify possible causes for autism, to help clinicians with their diagnosis but also to better understand the biological mechanisms at play in this complex disease. This is done through the project aiming at understanding the genetic architecture of autism in the Faroe Islands, and also with the newly starting IMI2 European project AIMS2-Trials.


Keywords
AlgorithmicsData managementData VisualizationGenomicsMachine learningProteomicsGenome analysisBiostatisticsProgram developmentScientific computingApplication of mathematics in sciencesExploratory data analysisSofware development and engineeringData and text miningGenetics
Organisms

Projects (0)

    Marie-Agnès DILLIES

    Group : HEAD - Hub Core

    I obtained an engineering degree in Biomedical engineering from Université de Technologie de Compiègne (UTC) in 1989, a master degree in Control of Complex Systems from UTC in 1990, a PhD in Control of Complex Systems from UTC in 1993, a University Degree in Human Genetics from The University of Rennes 1 in 2001 and a master degree in Functional Genomics from University Paris Diderot (Paris 7) in 2002. I worked as a statistician at the Transcriptome and Epigenome Platform from 2002 to 2017, where I was responsible for the statistical analyses of the data and had an important training activity (on the campus and outside). Since 2015 I have been co-head of the Bioinformatics and Biostatistics Hub within the Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI). I am co-director of the Pasteur course Introduction to Data Analysis and co-organiser of the sincellTE summer school (a school dedicated to single cell transcriptome and epigenome data analysis). I am also co-managing the StatOmique group which gathers more than 60 statisticians from France.


    Keywords
    RNA-seqStatistical inferenceTranscriptomicsBiostatisticsApplication of mathematics in sciencesExploratory data analysisIllumina HiSeqStatistical experiment designSequencing
    Organisms

    Projects (4)

    Amine GHOZLANE

    Group : SINGLE - Detached : Biomics

    After a PhD in informatics on graph analysis (metabolic networks and sRNA-mRNA interaction graphs) at the LaBRI (Université de Bordeaux), I joined the DSIMB team (INTS) for a post-doc on structural modeling. Then, I performed a second post-doc at Metagenopolis – INRA Jouy-en-Josas, where I was initiated to the analysis of metagenomic data. I was recruited at the HUB in 2015, and since I pursue the development of methods dedicated to the treatment of metagenomic data by combining either the treatment of sequencing data, the statistics, the protein structural modeling and the graph analysis.


    Keywords
    AlgorithmicsClusteringGenome assemblyGenomicsMetabolomicsModelingNon coding RNASequence analysisStructural bioinformaticsTargeted metagenomicsDatabaseGenome analysisBiostatisticsProgram developmentScientific computingDatabases and ontologiesExploratory data analysisData and text miningIllumina HiSeqComparative metagenomicsRead mappingIllumina MiSeqSequence homology analysisGene predictionMultidimensional data analysisSequencingShotgun metagenomics
    Organisms

    Projects (21)

    Quentin GIAI

    Group : - Hub Core


    Keywords

    Organisms

    Projects (0)

      Bernd JAGLA

      Group : PLATEFORM - Detached : Biomarker Discovery

      Bernd Jagla received his PhD in bioinformatics (department of Biology, Chemistry, and Parmacy) from the Free University in Berlin, Germany in 1999. Before joining the Institut Pasteur, he worked for almost ten years in New York City, including as an associate research scientist in the Joint Centers for System Biology (Columbia University) and at the Columbia University Screening Center led by Dr J.E. Rothman. He joined the Institut Pasteur in 2009 to take charge of the bioinformatic needs at the Transcriptome et Epigenome platform, focusing on Next Generation Sequencing. As of 2016 he is member of the C3BI – HUB Team detached to the Human immunology center (CIH) and provides support for cytometry, next generation sequencing, and microarray data analysis. His areas of interest include the quality assurance and data analysis and visualization at the facility. He also has strong expertise in developing algorithms for function prediction from sequence data, image analysis, analysis of mass spectrometry data, workflow management systems. While at Pasteur he developed: KNIME extensions for Next Generation Sequencing (Link) Post Alignment Visualization and Characterization of High-Throughput Sequencing Experiments (Link) Post Alignment statistics of Illumina reads (Link)


      Keywords
      AlgorithmicsChIP-seqData managementData VisualizationImage analysisMachine learningSequence analysisDatabaseGenome analysisBiostatisticsProgram developmentScientific computingData and text miningIllumina HiSeqGraphics and Image ProcessingIllumina MiSeqHigh Throughput ScreeningFlow cytometry/cell sortingPac Bio
      Organisms

      Projects (1)

      Thomas OBADIA


      Thomas is a biostatistician who holds an engineering degree in Agronomy (Agrocampus Ouest, Rennes, France). He also holds a Ph.D. in biostatistics from Université Pierre et Marie Curie for his work on the spread of nosocomial pathogens on contact networks. During his Ph.D at INSERM, he investigated how high-resolution dynamical contact data could support infection-tracing conducted using more traditional approaches in healthcare settings, e.g. routine swabbing and genetic characterization of strains detected in patients or healthcare workers. He developed a new statistical framework to test the correlation between dynamic close-proximity interaction networks and biological carriage data. While at INSERM, he also developed the R0 package for R that aimed at implementing several computation methods used in estimating reproduction parameters for emerging transmissible diseases. After working as a statistical modeller for a private company in the pharmaceutical industry, he joined the Hub in 2016 as a statistician and is now involved in the projects of the Malaria: parasites and hosts unit headed by Ivo Mueller.


      Keywords
      ModelingBiostatisticsScientific computingApplication of mathematics in sciencesClinical researchEpidemiology and public health
      Organisms

      Projects (3)

      Emeline PERTHAME

      Group : Stats - Hub Core

      Since February 2017 Research engineer, Hub of Bioinformatics and Biostatistics of the C3BI, Institut Pasteur 2015-2017 Post doctoral position, team MISTIS, INRIA Grenoble Topic: Robust clustering and robust non linear regression in high dimension. Collaboration with Florence Forbes (INRIA). 2012-2015 PhD thesis in Statistics, Applied Mathematics Department of Agrocampus-Ouest, IRMAR UMR 6625 CNRS, Rennes Topic: Stability of variable selection in regression and classification issues for correlated data in high dimension. Supervisor: David Causeur (Agrocampus-Ouest, IRMAR). Education 2015 PhD thesis in Statistics, Applied Mathematics Department of Agrocampus-Ouest, IRMAR UMR 6625 CNRS, Rennes 2012 ISUP degree (Institut de Statistique de l’UPMC), Université Pierre et Marie Curie, Paris 2012 Master 2 of Statistics, Université Pierre et Marie Curie, Paris


      Keywords
      ClusteringModelingStatistical inferenceTranscriptomicsBiostatisticsExploratory data analysisDimensional reductionStatistical experiment designMultidimensional data analysis
      Organisms

      Projects (16)

      Natalia PIETROSEMOLI

      Group : SysBio - Hub Core

      Dr. Natalia Pietrosemoli is an Engineer with a M. Sc. in Modeling and Simulation of Complex Realities from the International Center for Theoretical Physics, ICTP and the International School of Advanced Studies, SISSA (Triest, Italy). During her M. Sc. internships she mostly worked in modeling, optimization, combinatorics and information theory applied to medical imaging. In 2012 she got a Ph. D in Computational Biology from the School of Bioengineering of Rice University (Houston, TX, US), where she specialized in computational structural biology and functional genomics. Her doctoral thesis “Protein functional features extracted with from primary sequences : a focus on disordered regions”, contributed to a better understanding of the functional and evolutionary role of intrinsic disorder in protein plasticity, complexity and adaptation to stress conditions. As part of her Ph. D., Natalia was a visiting scholar in two labs in Madrid: the Structural Computational Biology Group at the Spanish National Cancer Research Centre (CNIO), where she mainly worked in sequence analysis and the functional-structural relationships of proteins, and the Computational Systems Biology Group at the Spanish National Centre for Biotechnology (CNB-CSIC ), where she studied the functional implications of intrinsically disordered proteins at the genomic level for several organisms, collaborating with different experimental and theoretical groups. In 2013, she joined the Swiss Institute of Bioinformatics as a postdoctoral fellow in the Bioinformactics Core Facility. Her main project consisted in the molecular classification of a rare type of lymphoma, which involved the integration of transcriptomic, clinical and mutational data for the identification of molecular markers for classification, diagnosis and prognosis. This work was performed in collaboration with the Pathology Institute at the University Hospital of Lausanne (CHUV). In November of 2015 Natalia joined the Hub Team @ Pasteur C3BI as a Senior Bioinformatician. Natalia is especially interested in the integrative analysis of different omics data, both at large-scale and for small datasets, and loves collaborating in interdisciplinary environments and having feedback from her fellow experimental colleagues. Currently, she’s coordinating several projects performing functional and pathway analysis at the genomic level. By grouping genes, proteins and other biological molecules into the pathways they are involved in, the complexity of the analyses is significantly reduced, while the explanatory power increases with respect to having a list of differentially expressed genes or proteins.


      Keywords
      AlgorithmicsData managementGenomicsImage analysisMachine learningModelingProteomicsSequence analysisStructural bioinformaticsTranscriptomicsDatabaseGenome analysisBiostatisticsScientific computingDatabases and ontologiesApplication of mathematics in sciencesData and text miningGeneticsGraphics and Image ProcessingBiosensors and biomarkersClinical researchCell biology and developmental biologyInteractomicsBioimage analysis
      Organisms

      Projects (26)

      Hugo VARET

      Group : PLATEFORM - Detached : Biomics

      Hugo Varet is a biostatistician engineer from the Ensai (Ecole Nationale de la Statistique et de l’Analyse de l’Information) and has been recruited by the hub of the C3BI (Center of Bioinformatics, Biostatistics and Integrative Biology) to work at the Transcriptome & Epigenome Platform. He is in charge of the statistical analyses of the RNA-Seq data produced by the platform and develops R pipelines that help in this task. One of them is named SARTools and is available on GitHub: https://github.com/PF2-pasteur-fr/SARTools.


      Keywords
      ModelingSequence analysisStatistical inferenceTranscriptomicsBiostatisticsScientific computingApplication of mathematics in sciencesExploratory data analysisHigh Throughput ScreeningClinical research
      Organisms

      Projects (17)

      Stevenn VOLANT

      Group : Stats - Embedded : Perception and Memory | Biomics

      After a diploma of statistician engineer from the Ensai (Ecole Nationale de la Statistique et de l’Analyse de l’Information) and a Ph.D in applied mathematics in the Statistics & Genome lab (AgroParisTech), I worked as a developer for the XLSTAT software. I have implemented some statistical methods such as mixture models, log-linear regression, mood test, bayesian hierarchical modeling CBC/HB, … Then I worked as a head teacher in statistics for one year. I was recruited in the Bioinformatic and biostatistic hub of the C3BI (Center of Bioinformatics, Biostatistics and Integrative Biology) in 2014, I am in charge of the statistical analysis and the development of R/R shiny pipelines.


      Keywords
      Machine learningStatistical inferenceTargeted metagenomicsBiostatisticsApplication of mathematics in sciencesStatistical experiment design
      Organisms

      Projects (26)

      Related projects (76)

      Mapping the cell surface signature of the developing mouse heart

      Cell surface protein signatures have been successful to discriminate hematopoietic progenitor populations allowing major advances in understanding blood cell production, to define pathways in hematologic malignancies and to foster new therapeutic approaches. Limited knowledge on the phenotype of cells that participate in heart formation impairs our understanding of progenitors of the cardiac cell lineages and their eventual persistence in the adult organ. As a consequence, therapies to restore heart function after injury have been unsuccessful. A number of membrane proteins have been identified on cardiomyocytes; on cardiac fibroblasts; and on endothelial cells, however a multi-parametric analysis of the phenotype of the different cardiac cell compartments along development is still missing. We combined multi-parametric flow cytometry with transcriptional characterization, based on well-known gene expression patterns, to describe major cardiac cell-subsets. The expression of CD24, CD54, Sca-1 and CD90 allowed defining cardiac populations in the non-hematopoietic and non-endothelial cell fraction by flow cytometry. Transcriptional profiling of the sorted populations enabled the identification of cardiomyocytes, in the CD24+ population, while differential expression of CD54, Sca-1 and CD90 defined four cardiac stromal compartments. The identified subsets exhibited specific distributions in three analyzed regions (atria, auriculo-ventricular junction and ventricles). We have thus identified a panel of surface markers, some of which novel in the cardiac context, that allowed assigning surface signatures to different cellular fractions by their unique transcriptional profiles. This work is the foundation for comprehensive studies on the role of different cell fractions by their unique transcriptional profiles.



      Project status : Closed

      Characterisation of skeletal muscle stem cell properties in distinct physiological states

      Stem cells are defined by their is their capacity for self-renewal and differentiation. Some adult tissues maintain a reservoir of stem cells, that generally reside within specialized microenvironments, known as stem cell niches, that regulate their behaviour. Skeletal muscle stem (satellite) cells are quiescent in homeostatic conditions in adults, and they are activated after muscle injury, when they re-enter the cell cycle, proliferate and differentiate into myoblasts, which will then fuse to form new muscle fibers. Satellite cells express the paired/homeodomain gene Pax7, which plays a critical role in satellite cell maintenance postnatally. Numerous experiments have shown that the skeletal muscle stem cell population is heterogeneous, therefore like many other stem cell systems, characterising the stem cell states is a major objective. In our laboratory, a reversible dormant cell state was identified, correspondent to a Pax7Hi quiescent subpopulation (top 10% of the Pax7-nGFP+ cells isolated from the transgenic mouse model Tg:Pax7-nGFP) with a lower metabolic activity and longer lag for the first cell division compared to Pax7Lo cells [1]. Muscle stem cells that survive for extended periods post-mortem are also dormant, suggesting that this property, in addition to anoxia [2] contributes to their viability. Therefore, different physiological states are associated with distinct cell states of muscle stem cells. Metabolism could play a critical role in dictating whether a cell remains quiescent, proliferates or differentiates. Stem cell metabolic plasticity in homeostasis and differentiation, as well as during cell reprogramming, is well described in different cell systems. However, unanswered questions remain regarding the metabolic regulation of satellite cell biology and skeletal muscle regeneration. In this project, we will investigate the behaviour of muscle stem cells in distinct physiological states, especially post-mortem and aging.



      Project status : Closed

      Identification of new cellular parameters involved in HIV-1 integration selectivity

      HIV-1 replication requires the integration of the viral genome into the cell genome. A viral-encoded enzyme, integrase (IN), performs this critical step of infection and is a promising target for anti-viral therapeutics. If the catalytic properties of INs are well characterized, the mechanisms responsible for their site selectivity are still under investigation. Several cellular proteins, such as the LEDFGF/p75 transcription co-activator, the RNA polymerase II machinery, nuclear pore proteins and specific modified histones have been proposed to be involved in IN selectivity at a genomic level but the underlying molecular mechanisms remain to be demonstrated. In addition, structural parameters of the target DNA helix (curvature, flexibility, topology) are proposed to regulate IN selectivity at a local level. Our aims are to study the role of these different parameters of IN selectivity, using both in vitro and in vivo approaches. In vitro, we will map integration sites on various target DNA substrates (naked DNA or chromatin, minicircles, plasmids with different topologies, transcribed templates) and will test the effect of purified proteins suspected to regulate IN selectivity. In vivo, integration sites will be mapped in cells depleted of these suspected regulators or in cells incubated with drugs targeting enzymes involved in transcription, DNA topology or histone modifications. Integration sites will be mapped using published or “home-made” protocols and the sites will be compared with DNA structural parameters, nucleosome positions, histone modifications or transcriptional parameters (published maps). Bio-informatics tools are crucial for these correlative and statistical analyses of integration sites. Our project relies on complementary in vivo, in vitro and in silico approaches. It should establish molecular and mechanistic rules of HIV-1 integration selectivity that could serve in the development of new antiviral strategies and of safer gene therapy vectors.



      Project status : Closed

      MicrocystOmics

      Les cyanobactéries sont des microorganismes qui prolifèrent dans de nombreux plans d’eau et perturbent leurs fonctionnements et leurs usages car elles sont capables de produire des toxines dangereuses pour la santé humaine et animale. Si la réglementation sanitaire est basée, pour l’instant, sur la surveillance d’une seule toxine, il est désormais connu que ces microorganismes sont capables d’en synthétiser un grand nombre qu’il conviendrait de mieux prendre en compte dans le futur. C’est pourquoi, dans le but de mieux connaître le potentiel toxique des cyanobactéries, ma thèse s'applique, par des études sur leur génome et par une approche de chimie, à caractériser les gènes impliqués dans la synthèse de ces métabolites ainsi que les métabolites produits par ces gènes, à déterminer sur des souches de culture et dans des échantillons naturels provenant de plans d’eau d’Ile de France quel est le potentiel de production de ces métabolites et à mieux comprendre les facteurs environnementaux qui favorisent cette production. Deux équipes de Paris (Pasteur et iEES) sont associées sur ce travail qui implique également des collaborations étrangères. S'il est désormais bien connu qu'une part importante du métabolisme des cyanobactéries qui sont des microorganismes photosynthétiques, est régulée en fonction des phases de lumière et d'obscurité, les connaissances disponibles sur la synthèse des métabolites secondaires sont en revanche beaucoup plus limitées. Ces métabolites ont pourtant un double intérêt puisque certains sont toxiques pour l'Homme alors que d'autres ont un intérêt pharmaceutique potentiel. Leur synthèse repose sur l'expression de clusters de gènes pouvant être de très grande taille (jusqu’à 100 kb par région).



      Project status : Closed

      Deciphering dormancy in Cryptocococcus neoformans



      Project status : Closed

      RNAseq analysis-gene ontology enrichment Clostridium tetani



      Project status : In Progress

      secretome analysis of human intestinal cells during shigella invasion



      Project status : In Progress

      Deciphering dormancy in Cryptocococcus neoformans



      Project status : Awaiting Publication

      Regulation of HIV-1 integration selectivity by chromatin

      Integration of the viral reverse-transcribed genome into the genome of infected cells is an essential step of retroviral replication and is performed by a viral-encoded enzyme, named integrase (IN). In the case of HIV-1, IN is a new and efficient anti-viral target. The selectivity of this enzyme for its cellular genomic sites is also a major parameter of HIV replication and is regulated by several cellular parameters. One of them is chromatin, and different levels of this nucleoprotein complex are involved in the regulation of IN selectivity. Using in vitro integration assays, established by our team and collaborators, we have studied this regulation at two levels of chromatin architecture: large poly-nucleosome templates (Botbol et al., 2008; Lesbats et al., 2011; Benleulmi et al., 2015; Naughtin et al., 2015) or nucleosome-induced DNA curvature mimicked by DNA minicircles (Pasi et al., 2016). Our present project is to study IN selectivity into mononucleosomes (MN). These MNs will be used as target substrates of integration and the role of MN structure, histone modifications and IN cofactors will be studied. Results obtained in vitro, will be confronted to structural data obtained by molecular modeling and to integration sites observed in infected cells. This project will benefit from our expertise in integration in chromatin templates and a previous collaboration with the C3BI on the analysis of integration sites (Pasi, M., Mornico, D., S. Volant, S., et al., 2016). This project is funded by the ANRS.



      Project status : In Progress

      Transcriptional regulation of innate lymphoid cell plasticity versus differentiation

      Over the last years, innate lymphoid cells (ILC) have been increasingly investigated. Despite the absence of antigen specific receptors, they belong to the lymphoid lineage and represent important sentinels for tissue homeostasis and inflammation. They contribute to numerous homeostatic and pathophysiological situations via specific cytokine production. ILC are currently divided into three groups based on the expression of specific transcription factors and secretion of cytokines. We focus this study on fetal ILC3 development. We have observed that contrary to lymphocytes, ILC can migrate toward lymphoid organs, tissues and mucosal sites as lymphoid precusors and terminate their developmental program in situ. In the fetal spleen, we observe different stages of ILC3 with precursors that are already RORgt+ but could still give rise to other ILC fate. Hence, these splenic ILC3 precursors were sorted and analyzed by microarrays. The identification of gene expression differences was used to design a single cell transcriptomic assay. The single cell transcriptomic assay is based on this specific selection of primers for transcription factors and cytokine receptors. We evaluate their differential expression in single cells at different stages of their plasticity. The aim is to decipher the progression from an ILC precursor stage to an another in one cell. We are also using the new polaris technology to detect and evaluate at different early timepoints the sequence of molecular events for changing ILC cell fate. In this case, we chose to use the sc RNAseq technology. The single cell transcriptomic will be analyzed and bioinformatic programs will be applied in order to organize the sequential molecular events and to build a hierarchical developmental model in case of ILC cell fate decisions.



      Project status : In Progress

      Mapping the genomic architecture of human neuroanatomical diversity

      Our recent analyses suggest that the genetic determinants of human neuroanatomical diversity are massively polygenic. Like other quantitative traits such as height – but also IQ or ASD risk – neuroanatomical diversity seems to result from the aggregated effect of thousands of frequent variants, each of small effect. GWAS should then require populations of hundreds of thousands of individuals to start to detect the individual variants. GCTA (genomic complex trait analysis) offers an alternative approach to obtain valuable neurogenetic information despite the current impossibility to detect enough individual variants to explaining any substantial part of the variability. We are currently pooling together neuroimaging genomics data from multiple international projects (in particular, IMAGEN, ENIGMA, UK Biobank) to replicate and extend our earlier analyses. We aim to: (1) Compute the amount of variance captured by genome-wide SNPs (SNP-heritability) for the several brain regions: ICV, BV, Hip, Th, Ca, Pa, Pu, Amy and Acc, (2) Compute the matrix of SNP-based genetic correlation among structures, (3) Partition the variance captured by SNPs among structural and functional sets: per chromosome, genic vs non-genic, low/medium/high minor-allele frequency, positive/negative selection, involved or not in neurodevelopment, etc. (4) Compare our results with those obtained using GWAS-based estimations (for example, those used in ENIGMA2). GCTA requires the computation of matrices of genetic relationship among all individuals, and thus, direct access to the genotyping data. Once the matrices are computed, the genotyping data is no longer required, and it is not possible to reconstruct an individual's genome from the matrices. Our analysis of the IMAGEN cohort was based on 1,765 Individuals, which gave us sufficient statistical power (80%) to detect only strong heritabilities (h2~45%), and the estimations had very large standard errors (~20%). A cohort of 4,000 subjects should allow us to decrease the standard error to ~8% (80% power to detect h2=22%), and a cohort of 8,000 subjects should decrease it to ~4% (80% power to detect h2=11%). In this way, we could obtain more accurate estimates, but also detect eventually more subtle effects related to functional genomic partitions.   References Yang et al (2010) Common SNPs explain a large proportion of heritability for human height. Nature Genetics, doi: 10.1038/ng.608 Davies et al (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular Psychiatry, doi: 10.1038/mp.2011.85 Gaugler et al (2014) Most genetic risk for autism resides with common variation. Nature Genetics, doi: 10.1038/ng.3039 Wood et al (2014) Defining the role of common variation in the genomic and biological architecture of adult human height, doi: 10.1038/ng.3097



      Project status : In Progress

      JASS: an online tool for the joint analysis of GWAS summary statistics

      In recent years, large genome-wide association studies (GWAS) have been successful in identifying thousands of significant genetic associations for multiple traits and diseases1. In the course of this endeavor, sample size has proven to be the key factor for identifying new variants. For example, GWAS of body mass index (BMI), now including up to 350,000 individuals from more than 100 cohorts, have been able to identify genetic variant that explain as low as 0.02% of BMI variance2. While standard approaches for detecting new genetic variants associated with traits and diseases will go on as sample size increases, multivariate analyses have been proposed as an alternative strategy for both improving detection of new variants and exploring the multidimensional components of complex traits and diseases. Intuitively, multivariate analysis can be used to improve detection of variants displaying a pleiotropic effect3 by accumulating moderate evidence of association across multiple traits and diseases. Several recent examples have been published about not only GWAS hit overlap across related traits4, but also of genome-wide shared genetic effect5. Multivariate analyses of GWAS have also proven useful to understand shared genetics between diseases5, and potential causal relationship between phenotypes using Mendelian randomization (MR)6. Importantly, most of existing multivariate methods are based on GWAS summary statistics, while approaches based on individual-level data have been seldom considered because of major practical and ethical issues. In the continuity of ongoing work on multi-phenotype analysis (Aschard et al 20147, Aschard et al 20158), we developed an effective and robust multivariate approach of GWAS summary statistics that addresses the major barriers of existing approaches, i.e. the presence of correlation between studies that would exists when GWAS analyzed share sample9-16. Our approach consists in a robust omnibus multivariate test of GWAS summary statis



      Project status : Awaiting Publication

      Genetic profile of patients with dyslexia

      Background: Dyslexia is characterized by difficulty with learning to read fluently and with accurate comprehension despite normal intelligence. It affects 5–10% of school-age children. Familial studies repeatedly showed that first-degree relatives of affected individuals have a 30–50% risk of developing the disorder. Twin studies showed that heritability was approximately 50% with a higher concordance rate for monozygotic twins compared to dizygotic twins. Although genetic factors contribute to dyslexia, very little is known on the genes associated with the condition. Preliminary data: Our project consists in the complementary analysis of (i) a cohort of 209 patients with dyslexia, 89 relatives and 95 very well phenotyped controls and (ii) an extended pedigree (Nantaise family) with 12 members diagnosed with dyslexia in three generations. For all the individuals of the project, we genotyped >600K SNPs in order to detect SNP association and copy-number variants (CNVs). For the extended pedigree, we also used linkage analysis and whole genome sequence (WGS). Our preliminary results indicate that a single region on chromosome 7q36 is segregating with dyslexia in the Nantaise family. The region is located within CNTNAP2, a gene previously proposed as a susceptibility gene, but without formal proof of its association. The WGS data of three affected and three unaffected individuals of the pedigree was performed to detect all the variants in the linkage region. Project: We proposed to use this unique resource in France to characterize the genetic profile of patients with dyslexia. We will (i) detect the CNVs present in the patients and (ii) detect the variants in the linkage region.



      Project status : In Progress

      ModeMood: Modeling Mood Disorders



      Project status : Awaiting Publication

      Infection of Ixodes ricinus by Borrelia burgdorferi sensu lato by in peri-urban forests of France

      Lyme borreliosis is the most common tick-borne disease in the northern hemisphere. In Europe, it is transmitted by Ixodes ticks that carries bacteria belonging to the Borrelia burgdorferi sensu lato complex. Our study was focused on peri-urban forests of Île-de-France. These forests are frequented by many visitors and the risk of exposure to tick bites is high. One of them, the Sénart forest, is located 30 km south of Paris (in the Île-de-France region) and has a large number of visitors (3 million per year in the late 1990s). This forest has the characteristics of being partly invaded by chipmunks (Tamias sibiricus). The chipmunk has been introduced from Eurasia, particularly Siberia, China and Korea. The first individuals were released by their owners at the western end of the Sénart forest, in the 1970s. The northeastern part of the forest was colonized recently. Our current study aims to evaluate the evolution of the infection of Ixodes ricinus by Borrelia burgdorferi sl. by comparing the results obtained during 3 years and to determine the consequences of the proliferation of this non-native rodent species, Tamias sibiricus, on the risk of transmission of Lyme borreliosis. For this purpose, we analyzed the rate of infection and the density of infected ticks during 2008, 2009 and 2011 in several locations of the Sénart forest. These results were compared to those obtained for ticks collected in 2009 in two other peri-urban forests of Île-de-France (Rambouillet and Notre-Dame) that have not yet been colonized by these rodents. The density of nymphs, adults as well as the infected density of nymphs and adults were compared according to several factors: location of tick collection in the forest,  presence or absence of chipmunks, type of vegetation, temperature and humidity.



      Project status : Closed

      Multi-traits GWAS in Malaria

      Malaria is a complex disease resulting in more than 700,000 deaths per year, most notably among under 5-year olds in sub–Saharan Africa. Malaria is caused by several Plasmodium spp. parasites of which P. falciparum is responsible for the majority of deaths worldwide. Parasites are inoculated by infectious mosquitoes during their bloodmeal. Subsequent development of the parasite within the liver leads to the blood stage infection, where parasites replicate within red blood cells leading to fever, anemia and cerebral complications. Acquired sterilizing immunity is never attained despite repeated infections, but clinical immunity does develop whereby the individual can harbor parasites for long periods without expressing clinical symptoms – an asymptomatic infection. Evidence for a contribution of host genetic factors to mild clinical malaria and biological phenotypes, such as number of clinical episodes, parasite density, immune responses to P. falciparum antigens has progressed with the development of increasingly sophisticated techniques. Population level differences in susceptibility to malaria have been observed between sympatric ethnic groups1 and, at a finer scale, differential phenotypic expression was observed in monozygotic and dizygotic twins.2 The importance of host genetics has been further demonstrated by segregation studies,2,3 linkage analysis,4-7 and candidate gene approach.8,9 Overall, emphasis has understandably been placed on clinical malaria and very few studies have considered asymptomatic malaria.



      Project status : Declined

      Biomarqueurs d’identification précoce du sepsis aux urgences (BIPS)

      Rationnel. Le mode de présentation clinique du sepsis est très polymorphe. Chez les patients septiques consultants dans les services des urgences, la présence d’une hyperthermie ou d’autres critères du syndrome de réponse inflammatoire systémique (SIRS) n’est pas suffisante pour aider au diagnostic de sepsis. De nombreux efforts de recherche ont abouti à la proposition d’innombrables biomarqueurs de sepsis essentiellement étudiés en soins intensifs. Même si certains, comme la procalcitonine (PCT) ont atteint un relativement bon degré de prédiction aux urgences, leur usage en routine demeure controversé. Compte tenu de la physiopathologie complexe du sepsis, une approche combinatoire pourrait permettre d’atteindre des performances difficilement envisageables avec un biomarqueur seul. Objectif primaire. Etudier les performances statistiques d’un panel de biomarqueurs d’intérêt, individuellement et en association, pour le diagnostic de sepsis aux urgences. Objectifs secondaires. Etudier les performances statistiques d’un panel de biomarqueurs d’intérêt, individuellement et en association, pour le diagnostic d’état septique grave (sepsis sévère et choc septique) et la stratification du risque (prédiction de l’admission en soins intensifs et/ou du décès). Type d’étude. Etude de cohorte monocentrique prospective non-interventionnelle Patients et critères d’inclusion. 300 patients consultant dans le service des urgences ayant une suspicion de sepsis + 30 sujets sains. Critères de non inclusion. Patient mineur de moins de 18 ans, femme enceinte, conditions de vie rendant impossible le suivi à 28 jours, refus de participer à l’étude. Mesures. Pour chaque patient, lors du bilan sanguin initial, prélèvement de 3 tubes pour le dosage a posteriori d’un panel de biomarqueurs d’intérêt explorant les différentes voies biologiques activées au cours du sepsis.  



      Project status : In Progress

      Genotype to phenotype analysis of immune responses in chronic inflammatory diseases



      Project status : In Progress

      Evaluation de la représentativité génétique d'un pool de souches

      Objectives: Our objective was, through whole genome sequencing, to establish a comprehensive repertoire of the non-synonymous polymorphisms (natural polymorphisms and/or mutations of resistance) in genes involved in resistance to azoles and echinocandins. Methods: Two collections of C. albicans clinical isolates were used. The first one consists of 151 epidemiologically-unrelated strains susceptible to antifungal agents. The second one consists of 9 isolates resistant to fluconazole and 1 to fluconazole and caspofungin. The whole genome sequencing was performed on an Illumina HiSeq 2000, generating 100 bp reads (roughly 100X coverage on average). The reads were mapped to the SC5314 reference genome (Assembly 22) using the BWA alignment tool. Then SNPs were detected by GATK and selected with the recommended filters. Using homemade scripts we analyzed the sequences of 5 genes involved in the resistance to azoles (ERG11, TAC1, MRR1 and UPC2) and echinocandins (FKS1) and compared them to the sequences of reference strain SC5314. Results: Among the 151 antifungal susceptible strains we identified 126 distinct natural amino-acid substitutions. Using this repertoire, we identified from the 10 resistant strains, 22 amino-acid substitutions in addition to some of the above-mentioned natural polymorphisms. Among them, 10 have already been associated to azoles or echinocandins resistance. The remaining 12 substitutions are novel putative azoles resistance mutations affecting ERG11 (n=4/11), TAC1 (n=6/7) and UPC2 (n=2/3).



      Project status : In Progress

      Insight into the Immune System: A bioresource and data-sharing platform to study chronic inflammatory diseases (IsIShare)

      Chronic inflammatory systemic diseases (CIDs) are a burden to humans because of life-long debilitating illness, increased mortality and high therapy costs. CIDs’ increasing prevalence in western countries has indeed placed them at the third rank of morbi-mortality causes. Unfortunately, available treatments are poorly targeted and non-curative. That is partly linked to a complex and largely ununderstood pathophysiology. Genetic susceptibility clearly plays a role. Genes linked to the immune system have been identified, but causal genes remain mostly unknown and other factors such as intestinal microbiota have also been implicated. The complexity of CIDs’ pathophysiology suggests that a holistic approach is the most susceptible to help make significant progress. Our project intends to take advantage of recent technical progress and development of informatics tools to set up a transversal approach. High-resolution sequencing technology indeed quickly produces large amounts of accurate data. Besides, new integrative informatics tools allowing storage and integrative analysis of this resulting high amount of data are now available. We intend to set-up a CID’s network allowing the gathering and extensive analysis of data related to immuno-genetic determinants, immune repertoire and microbiota from individuals suffering from one of the three major interlinked CIDs, namely Hidradenitis Suppurativa (HS), Crohn’s disease (CD) and Spondyloarthropathy (SpA) as compared to healthy volunteers.



      Project status : Closed