Increased peak detection accuracy in over-dispersed ChIP-seq data with
supervised segmentation models
- URL: http://arxiv.org/abs/2012.06848v2
- Date: Tue, 15 Dec 2020 12:34:48 GMT
- Title: Increased peak detection accuracy in over-dispersed ChIP-seq data with
supervised segmentation models
- Authors: Arnaud Liehrmann, Guillem Rigaill and Toby Dylan Hocking
- Abstract summary: We show that unconstrained multiple changepoint detection model, with alternative noise assumptions and a suitable setup, reduces the over-dispersion exhibited by count data.
Results: We show that the unconstrained multiple changepoint detection model, with alternative noise assumptions and a suitable setup, reduces the over-dispersion exhibited by count data.
- Score: 2.2559617939136505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: Histone modification constitutes a basic mechanism for the
genetic regulation of gene expression. In early 2000s, a powerful technique has
emerged that couples chromatin immunoprecipitation with high-throughput
sequencing (ChIP-seq). This technique provides a direct survey of the DNA
regions associated to these modifications. In order to realize the full
potential of this technique, increasingly sophisticated statistical algorithms
have been developed or adapted to analyze the massive amount of data it
generates. Many of these algorithms were built around natural assumptions such
as the Poisson one to model the noise in the count data. In this work we start
from these natural assumptions and show that it is possible to improve upon
them. Results: The results of our comparisons on seven reference datasets of
histone modifications (H3K36me3 and H3K4me3) suggest that natural assumptions
are not always realistic under application conditions. We show that the
unconstrained multiple changepoint detection model, with alternative noise
assumptions and a suitable setup, reduces the over-dispersion exhibited by
count data and turns out to detect peaks more accurately than algorithms which
rely on these natural assumptions.
Related papers
- Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Stacked ensemble\-based mutagenicity prediction model using multiple modalities with graph attention network [0.9736758288065405]
Mutagenicity is a concern due to its association with genetic mutations which can result in a variety of negative consequences.
In this work, we introduce a novel stacked ensemble based mutagenicity prediction model.
arXiv Detail & Related papers (2024-09-03T09:14:21Z) - Conditionally-Conjugate Gaussian Process Factor Analysis for Spike Count Data via Data Augmentation [8.114880112033644]
Recently, GPFA has been extended to model spike count data.
We propose a conditionally-conjugate Gaussian process factor analysis (ccGPFA) resulting in both analytically and computationally tractable inference.
arXiv Detail & Related papers (2024-05-19T21:53:36Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Latent Gaussian Model Boosting [0.0]
Tree-boosting shows excellent predictive accuracy on many data sets.
We obtain increased predictive accuracy compared to existing approaches in both simulated and real-world data experiments.
arXiv Detail & Related papers (2021-05-19T07:36:30Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.