SGD with Coordinate Sampling: Theory and Practice
- URL: http://arxiv.org/abs/2105.11818v1
- Date: Tue, 25 May 2021 10:37:50 GMT
- Title: SGD with Coordinate Sampling: Theory and Practice
- Authors: R\'emi Leluc and Fran\c{c}ois Portier
- Abstract summary: MUSKETEER is an adaptive gradient descent algorithm for large scale problems.
It samples the most coordinate coordinates on the noisy gradients and moves along the one direction yielding an important decrease of coordinates.
Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While classical forms of stochastic gradient descent algorithm treat the
different coordinates in the same way, a framework allowing for adaptive (non
uniform) coordinate sampling is developed to leverage structure in data. In a
non-convex setting and including zeroth order gradient estimate, almost sure
convergence as well as non-asymptotic bounds are established. Within the
proposed framework, we develop an algorithm, MUSKETEER, based on a
reinforcement strategy: after collecting information on the noisy gradients, it
samples the most promising coordinate (all for one); then it moves along the
one direction yielding an important decrease of the objective (one for all).
Numerical experiments on both synthetic and real data examples confirm the
effectiveness of MUSKETEER in large scale problems.
Related papers
- Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance [55.01966743652196]
We propose a novel algorithm for distributed gradient descent (SGD) with compressed gradient communication in the parameter-server framework.
Our gradient compression technique, named flattened one-bit gradient descent (FO-SGD), relies on two simple algorithmic ideas.
arXiv Detail & Related papers (2024-05-17T21:17:27Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Tackling Data Heterogeneity: A New Unified Framework for Decentralized
SGD with Sample-induced Topology [6.6682038218782065]
We develop a general framework unifying several gradient-based optimization methods for empirical risk minimization problems.
We provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA.
The rate results reveal that VR and GT methods can effectively eliminate data within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution.
arXiv Detail & Related papers (2022-07-08T07:50:08Z) - Sensing Cox Processes via Posterior Sampling and Positive Bases [56.82162768921196]
We study adaptive sensing of point processes, a widely used model from spatial statistics.
We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis.
Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textscCox-Thompson) and top-two posterior sampling (textscTop2) principles.
arXiv Detail & Related papers (2021-10-21T14:47:06Z) - COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic
Convex Optimization [4.970364068620608]
We exploit convexity and L-smoothness to improve the noisy estimates outputted by the gradient oracle.
We show that increasing the number and proximity of the queried points leads to better gradient estimates.
We also apply COCO in vanilla settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA.
arXiv Detail & Related papers (2021-09-07T17:21:09Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Stochastic Compositional Gradient Descent under Compositional
constraints [13.170519806372075]
We study constrained optimization problems where the objective and constraint functions are convex and expressed as compositions of functions.
The problem arises in fair classification/regression and in the design of queuing systems.
We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely.
arXiv Detail & Related papers (2020-12-17T05:38:37Z) - Model identification and local linear convergence of coordinate descent [74.87531444344381]
We show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions.
We also prove explicit local linear convergence rates for coordinate descent.
arXiv Detail & Related papers (2020-10-22T16:03:19Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.