Population based change-point detection for the identification of
homozygosity islands
- URL: http://arxiv.org/abs/2111.10187v1
- Date: Fri, 19 Nov 2021 12:53:41 GMT
- Title: Population based change-point detection for the identification of
homozygosity islands
- Authors: Lucas Prates, Renan B Lemes, T\'abita H\"unemeier and Florencia
Leonardi
- Abstract summary: We introduce a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast greedy binary splitting algorithm.
We prove both algorithms converge almost surely to the set of change-points under very general assumptions on the distribution and independent sampling of the random vector.
This new approach is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new method for offline change-point detection on
some parameters of the distribution of a random vector. We introduce a
penalized maximum likelihood approach that can be efficiently computed by a
dynamic programming algorithm or approximated by a fast greedy binary splitting
algorithm. We prove both algorithms converge almost surely to the set of
change-points under very general assumptions on the distribution and
independent sampling of the random vector. In particular, we show the
assumptions leading to the consistency of the algorithms are satisfied by
categorical and Gaussian random variables. This new approach is motivated by
the problem of identifying homozygosity islands on the genome of individuals in
a population. Our method directly tackles the issue of identification of the
homozygosity islands at the population level, without the need of analyzing
single individuals and then combining the results, as is made nowadays in
state-of-the-art approaches.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Distributed Bayesian Estimation in Sensor Networks: Consensus on
Marginal Densities [15.038649101409804]
We derive a distributed provably-correct algorithm in the functional space of probability distributions over continuous variables.
We leverage these results to obtain new distributed estimators restricted to subsets of variables observed by individual agents.
This relates to applications such as cooperative localization and federated learning, where the data collected at any agent depends on a subset of all variables of interest.
arXiv Detail & Related papers (2023-12-02T21:10:06Z) - Calibrate and Debias Layer-wise Sampling for Graph Convolutional
Networks [39.56471534442315]
This paper revisits the approach from a matrix approximation perspective.
We propose a new principle for constructing sampling probabilities and an efficient debiasing algorithm.
Improvements are demonstrated by extensive analyses of estimation variance and experiments on common benchmarks.
arXiv Detail & Related papers (2022-06-01T15:52:06Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - An Embedded Model Estimator for Non-Stationary Random Functions using
Multiple Secondary Variables [0.0]
This paper introduces the method and shows that it has consistency results that are similar in nature to those applying to geostatistical modelling and to Quantile Random Forests.
The algorithm works by estimating a conditional distribution for the target variable at each target location.
arXiv Detail & Related papers (2020-11-09T00:14:24Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data.
We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem.
In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z) - Mean shift cluster recognition method implementation in the nested
sampling algorithm [0.0]
Nested sampling is an efficient algorithm for the calculation of the Bayesian evidence and posterior parameter probability distributions.
Here we present a new solution based on the mean shift cluster recognition method implemented in a random walk search algorithm.
arXiv Detail & Related papers (2020-01-31T15:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.