COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic
Convex Optimization
- URL: http://arxiv.org/abs/2109.03207v1
- Date: Tue, 7 Sep 2021 17:21:09 GMT
- Title: COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic
Convex Optimization
- Authors: Manuel Madeira, Renato Negrinho, Jo\~ao Xavier and Pedro M. Q. Aguiar
- Abstract summary: We exploit convexity and L-smoothness to improve the noisy estimates outputted by the gradient oracle.
We show that increasing the number and proximity of the queried points leads to better gradient estimates.
We also apply COCO in vanilla settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA.
- Score: 4.970364068620608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: First-order methods for stochastic optimization have undeniable relevance, in
part due to their pivotal role in machine learning. Variance reduction for
these algorithms has become an important research topic. In contrast to common
approaches, which rarely leverage global models of the objective function, we
exploit convexity and L-smoothness to improve the noisy estimates outputted by
the stochastic gradient oracle. Our method, named COCO denoiser, is the joint
maximum likelihood estimator of multiple function gradients from their noisy
observations, subject to co-coercivity constraints between them. The resulting
estimate is the solution of a convex Quadratically Constrained Quadratic
Problem. Although this problem is expensive to solve by interior point methods,
we exploit its structure to apply an accelerated first-order algorithm, the
Fast Dual Proximal Gradient method. Besides analytically characterizing the
proposed estimator, we show empirically that increasing the number and
proximity of the queried points leads to better gradient estimates. We also
apply COCO in stochastic settings by plugging it in existing algorithms, such
as SGD, Adam or STRSAGA, outperforming their vanilla versions, even in
scenarios where our modelling assumptions are mismatched.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Accelerated stochastic approximation with state-dependent noise [7.4648480208501455]
We consider a class of smooth convex optimization problems under general assumptions on the quadratic noise in the gradient observation.
Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics.
We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate.
arXiv Detail & Related papers (2023-07-04T06:06:10Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Stochastic Compositional Gradient Descent under Compositional
constraints [13.170519806372075]
We study constrained optimization problems where the objective and constraint functions are convex and expressed as compositions of functions.
The problem arises in fair classification/regression and in the design of queuing systems.
We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely.
arXiv Detail & Related papers (2020-12-17T05:38:37Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Non-asymptotic bounds for stochastic optimization with biased noisy
gradient oracles [8.655294504286635]
We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error.
Our proposed oracles are in practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed simulation.
arXiv Detail & Related papers (2020-02-26T12:53:04Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.