Stochastic Damped L-BFGS with Controlled Norm of the Hessian
Approximation
- URL: http://arxiv.org/abs/2012.05783v1
- Date: Thu, 10 Dec 2020 16:19:02 GMT
- Title: Stochastic Damped L-BFGS with Controlled Norm of the Hessian
Approximation
- Authors: Sanae Lotfi and Tiphaine Bonniot de Ruisselet and Dominique Orban and
Andrea Lodi
- Abstract summary: We propose a new variance- damped L-BFGS, where we leverage estimates of bounds on the largest and smallest eigen approximation to balance its quality and conditioning.
Our VARCHEN, draws from previous work that proposed a novel damped L-BFGS algorithm called SdLBFGS.
We demonstrate that VARCHEN is more robust than SdLBFGSVR and SVRG on a modified DavidNet problem.
- Score: 3.0204520109309843
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose a new stochastic variance-reduced damped L-BFGS algorithm, where
we leverage estimates of bounds on the largest and smallest eigenvalues of the
Hessian approximation to balance its quality and conditioning. Our algorithm,
VARCHEN, draws from previous work that proposed a novel stochastic damped
L-BFGS algorithm called SdLBFGS. We establish almost sure convergence to a
stationary point and a complexity bound. We empirically demonstrate that
VARCHEN is more robust than SdLBFGS-VR and SVRG on a modified DavidNet problem
-- a highly nonconvex and ill-conditioned problem that arises in the context of
deep learning, and their performance is comparable on a logistic regression
problem and a nonconvex support-vector machine problem.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Taming Nonconvex Stochastic Mirror Descent with General Bregman
Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting.
For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z) - Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems.
Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural
Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years.
We show that there is a lack of approaches to compute a stationary point deterministically.
We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z) - Accelerated and instance-optimal policy evaluation with linear function
approximation [17.995515643150657]
Existing algorithms fail to match at least one of these lower bounds.
We develop an accelerated, variance-reduced fast temporal difference algorithm that simultaneously matches both lower bounds and attains a strong notion of instance-optimality.
arXiv Detail & Related papers (2021-12-24T17:21:04Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.