Randomized Block-Diagonal Preconditioning for Parallel Learning
- URL: http://arxiv.org/abs/2006.13591v2
- Date: Mon, 7 Dec 2020 09:33:02 GMT
- Title: Randomized Block-Diagonal Preconditioning for Parallel Learning
- Authors: Celestine Mendler-D\"unner, Aurelien Lucchi
- Abstract summary: We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form.
Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study preconditioned gradient-based optimization methods where the
preconditioning matrix has block-diagonal form. Such a structural constraint
comes with the advantage that the update computation is block-separable and can
be parallelized across multiple independent tasks. Our main contribution is to
demonstrate that the convergence of these methods can significantly be improved
by a randomization technique which corresponds to repartitioning coordinates
across tasks during the optimization procedure. We provide a theoretical
analysis that accurately characterizes the expected convergence gains of
repartitioning and validate our findings empirically on various traditional
machine learning tasks. From an implementation perspective, block-separable
models are well suited for parallelization and, when shared memory is
available, randomization can be implemented on top of existing methods very
efficiently to improve convergence.
Related papers
- A Generalization Result for Convergence in Learning-to-Optimize [4.112909937203119]
Conventional convergence guarantees in optimization are based on geometric arguments, which cannot be applied to algorithms.
We are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our knowledge, we are the first to prove the best of our
arXiv Detail & Related papers (2024-10-10T08:17:04Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - A unified consensus-based parallel ADMM algorithm for high-dimensional
regression with combined regularizations [3.280169909938912]
parallel alternating multipliers (ADMM) is widely recognized for its effectiveness in handling large-scale distributed datasets.
The proposed algorithms serve to demonstrate the reliability, stability, and scalability of a financial example.
arXiv Detail & Related papers (2023-11-21T03:30:38Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Progressive Batching for Efficient Non-linear Least Squares [31.082253632197023]
Most improvements of the basic Gauss-Newton tackle convergence guarantees or leverage the sparsity of the underlying problem structure for computational speedup.
Our work borrows ideas from both machine learning and statistics, and we present an approach for non-linear least-squares that guarantees convergence while at the same time significantly reduces the required amount of computation.
arXiv Detail & Related papers (2020-10-21T13:00:04Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - CWY Parametrization: a Solution for Parallelized Optimization of
Orthogonal and Stiefel Matrices [41.57234424773276]
We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs.
We further develop a novel Truncated CWY (or T-CWY) approach for Stiefel manifold parametrization.
We apply our methods to train recurrent neural network architectures in the tasks of neural machine video prediction.
arXiv Detail & Related papers (2020-04-18T17:58:43Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.