Distributed Linear Regression with Compositional Covariates
- URL: http://arxiv.org/abs/2310.13969v1
- Date: Sat, 21 Oct 2023 11:09:37 GMT
- Title: Distributed Linear Regression with Compositional Covariates
- Authors: Yue Chao, Lei Huang, Xuejun Ma
- Abstract summary: We focus on the distributed sparse penalized linear log-contrast model in massive compositional data.
Two distributed optimization techniques are proposed for solving the two different constrained convex optimization problems.
In the decentralized topology, we introduce a distributed coordinate-wise descent algorithm for obtaining a communication-efficient regularized estimation.
- Score: 5.085889377571319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the availability of extraordinarily huge data sets, solving the problems
of distributed statistical methodology and computing for such data sets has
become increasingly crucial in the big data area. In this paper, we focus on
the distributed sparse penalized linear log-contrast model in massive
compositional data. In particular, two distributed optimization techniques
under centralized and decentralized topologies are proposed for solving the two
different constrained convex optimization problems. Both two proposed
algorithms are based on the frameworks of Alternating Direction Method of
Multipliers (ADMM) and Coordinate Descent Method of Multipliers(CDMM, Lin et
al., 2014, Biometrika). It is worth emphasizing that, in the decentralized
topology, we introduce a distributed coordinate-wise descent algorithm based on
Group ADMM(GADMM, Elgabli et al., 2020, Journal of Machine Learning Research)
for obtaining a communication-efficient regularized estimation.
Correspondingly, the convergence theories of the proposed algorithms are
rigorously established under some regularity conditions. Numerical experiments
on both synthetic and real data are conducted to evaluate our proposed
algorithms.
Related papers
- Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering [22.421702511126373]
We formulate clustering as estimating underlying communities in the directed block model.
We introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
arXiv Detail & Related papers (2024-03-28T15:47:13Z) - Quantized Hierarchical Federated Learning: A Robust Approach to
Statistical Heterogeneity [3.8798345704175534]
We present a novel hierarchical federated learning algorithm that incorporates quantization for communication-efficiency.
We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate.
Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters.
arXiv Detail & Related papers (2024-03-03T15:40:24Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - A unified consensus-based parallel ADMM algorithm for high-dimensional
regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets.
The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers [9.248526557884498]
We consider the problem of solving a large-scale system of linear equations in a distributed or federated manner by a taskmaster and a set of machines.
We compare two well-known classes of algorithms used to solve this problem: projection-based methods and optimization-based methods.
arXiv Detail & Related papers (2023-04-20T20:48:00Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Distributed Variational Bayesian Algorithms Over Sensor Networks [6.572330981878818]
We propose two novel distributed VB algorithms for general Bayesian inference problem.
The proposed algorithms have excellent performance, which are almost as good as the corresponding centralized VB algorithm relying on all data available in a fusion center.
arXiv Detail & Related papers (2020-11-27T08:12:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.