Related papers: Randomized Block-Diagonal Preconditioning for Parallel Learning

Randomized Block-Diagonal Preconditioning for Parallel Learning

URL: http://arxiv.org/abs/2006.13591v2
Date: Mon, 7 Dec 2020 09:33:02 GMT
Title: Randomized Block-Diagonal Preconditioning for Parallel Learning
Authors: Celestine Mendler-D\"unner, Aurelien Lucchi
Abstract summary: We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique which corresponds to repartitioning coordinates across tasks during the optimization procedure. We provide a theoretical analysis that accurately characterizes the expected convergence gains of repartitioning and validate our findings empirically on various traditional machine learning tasks. From an implementation perspective, block-separable models are well suited for parallelization and, when shared memory is available, randomization can be implemented on top of existing methods very efficiently to improve convergence.

Related papers

Deep Clustering via Probabilistic Ratio-Cut Optimization [0.7366405857677227]
We propose a novel approach for optimizing the graph ratio-cut by modeling the binary assignments as random variables. We provide an upper bound on the expected ratio-cut, as well as an unbiased estimate of its gradient, to learn the parameters of the assignment variables in an online setting.
arXiv Detail & Related papers (2025-02-05T17:47:53Z)
A Generalization Result for Convergence in Learning-to-Optimize [4.112909937203119]
Conventional convergence guarantees in optimization are based on geometric arguments, which cannot be applied to algorithms. We are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our
arXiv Detail & Related papers (2024-10-10T08:17:04Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Efficient Computation of Sparse and Robust Maximum Association Estimators [0.4588028371034406]
Robust statistical estimators offer empirical precision but are often computationally challenging in high-dimensional sparse settings.<n>Modern association estimator techniques are utilized for outliers without imposing resilience against other robust methods.
arXiv Detail & Related papers (2023-11-29T11:57:50Z)
A unified consensus-based parallel ADMM algorithm for high-dimensional regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets. The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning. We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z)
Progressive Batching for Efficient Non-linear Least Squares [31.082253632197023]
Most improvements of the basic Gauss-Newton tackle convergence guarantees or leverage the sparsity of the underlying problem structure for computational speedup. Our work borrows ideas from both machine learning and statistics, and we present an approach for non-linear least-squares that guarantees convergence while at the same time significantly reduces the required amount of computation.
arXiv Detail & Related papers (2020-10-21T13:00:04Z)
Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices [41.57234424773276]
We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs. We further develop a novel Truncated CWY (or T-CWY) approach for Stiefel manifold parametrization. We apply our methods to train recurrent neural network architectures in the tasks of neural machine video prediction.
arXiv Detail & Related papers (2020-04-18T17:58:43Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning. We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both. Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.