EVENT : C3BI Seminars
Bayesian Markov models for regulatory motif prediction – Sensitive sequence searching for parallelized analysis of massive data sets
Main speaker : Johannes Soeding, from Head of Quantitative and Computational Biology group – Max Planck of Biophysical Chemistry, Goettigen Date : 06/10/2016 at 02:00 pm Location : Retrovirus room – LWOFF (22) ,Institut Pasteur, Paris
Bayesian Markov models consistently outperform PWMs at regulatory motif prediction – Sensitive protein sequence searching for parallelized analysis of massive data sets
The talk will cover two very different topics. First I will present Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k − 1 act as priors for those of order k. This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. BaMMs improve on PWMs by ~40% in AUC on ~400 ENCODE ChIP-seq data sets and achieves similar improvements in detecting core promoter sequences, poly(A) sites, RNAP pause sites and binding sites for ~20 PAR-CLIPped RNA binding factors. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs.
Second, I will present our new method MMseqs2 (Many-against-Many sequence searching) for very fast batch protein sequence searches and clustering of huge protein sequence data sets. Protein sequence searching is the main time, cost and quality bottleneck for the analysis of metagenomic datasets. While previous search methods sacrificed sensitivity for speed gains, MMseqs2 is as sensitive as BLAST, more sensitive than PSI-BLAST, and 36 to 1300 times faster. I will explain the ideas that led to this massive improvement. MMseqs2 searching and clustering will considerably increase the fraction of annotatable metagenomic ORFs.
Due to security policy in Institut Pasteur, please register before if you plan to come to this meeting