SGD with Coordinate Sampling: Theory and Practice
- URL: http://arxiv.org/abs/2105.11818v1
- Date: Tue, 25 May 2021 10:37:50 GMT
- Title: SGD with Coordinate Sampling: Theory and Practice
- Authors: R\'emi Leluc and Fran\c{c}ois Portier
- Abstract summary: MUSKETEER is an adaptive gradient descent algorithm for large scale problems.
It samples the most coordinate coordinates on the noisy gradients and moves along the one direction yielding an important decrease of coordinates.
Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While classical forms of stochastic gradient descent algorithm treat the
different coordinates in the same way, a framework allowing for adaptive (non
uniform) coordinate sampling is developed to leverage structure in data. In a
non-convex setting and including zeroth order gradient estimate, almost sure
convergence as well as non-asymptotic bounds are established. Within the
proposed framework, we develop an algorithm, MUSKETEER, based on a
reinforcement strategy: after collecting information on the noisy gradients, it
samples the most promising coordinate (all for one); then it moves along the
one direction yielding an important decrease of the objective (one for all).
Numerical experiments on both synthetic and real data examples confirm the
effectiveness of MUSKETEER in large scale problems.
Related papers
- A Historical Trajectory Assisted Optimization Method for Zeroth-Order Federated Learning [24.111048817721592]
Federated learning heavily relies on distributed gradient descent techniques.
In the situation where gradient information is not available, gradients need to be estimated from zeroth-order information.
We propose a non-isotropic sampling method to improve the gradient estimation procedure.
arXiv Detail & Related papers (2024-09-24T10:36:40Z) - An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models [6.54203362045253]
We study the problem of learning networks from continuous observational data, generated according to a linear Gaussian structural equation model.
We propose a new coordinate descent algorithm that converges to the optimal objective value of the $ell$penalized maximum likelihood.
arXiv Detail & Related papers (2024-08-21T20:18:03Z) - Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Tackling Data Heterogeneity: A New Unified Framework for Decentralized
SGD with Sample-induced Topology [6.6682038218782065]
We develop a general framework unifying several gradient-based optimization methods for empirical risk minimization problems.
We provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA.
The rate results reveal that VR and GT methods can effectively eliminate data within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution.
arXiv Detail & Related papers (2022-07-08T07:50:08Z) - Sensing Cox Processes via Posterior Sampling and Positive Bases [56.82162768921196]
We study adaptive sensing of point processes, a widely used model from spatial statistics.
We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis.
Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textscCox-Thompson) and top-two posterior sampling (textscTop2) principles.
arXiv Detail & Related papers (2021-10-21T14:47:06Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Model identification and local linear convergence of coordinate descent [74.87531444344381]
We show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions.
We also prove explicit local linear convergence rates for coordinate descent.
arXiv Detail & Related papers (2020-10-22T16:03:19Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.