Adam-family Methods for Nonsmooth Optimization with Convergence
Guarantees
- URL: http://arxiv.org/abs/2305.03938v2
- Date: Mon, 19 Feb 2024 07:59:56 GMT
- Title: Adam-family Methods for Nonsmooth Optimization with Convergence
Guarantees
- Authors: Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh
- Abstract summary: We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions.
Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks.
We develop subgradient methods that incorporate clipping techniques for training nonsmooth neural networks with heavy-tailed noise.
- Score: 5.69991777684143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a comprehensive study on the convergence properties
of Adam-family methods for nonsmooth optimization, especially in the training
of nonsmooth neural networks. We introduce a novel two-timescale framework that
adopts a two-timescale updating scheme, and prove its convergence properties
under mild assumptions. Our proposed framework encompasses various popular
Adam-family methods, providing convergence guarantees for these methods in
training nonsmooth neural networks. Furthermore, we develop stochastic
subgradient methods that incorporate gradient clipping techniques for training
nonsmooth neural networks with heavy-tailed noise. Through our framework, we
show that our proposed methods converge even when the evaluation noises are
only assumed to be integrable. Extensive numerical experiments demonstrate the
high efficiency and robustness of our proposed methods.
Related papers
- Enhancing CNN Classification with Lamarckian Memetic Algorithms and Local Search [0.0]
We propose a novel approach integrating a two-stage training technique with population-based optimization algorithms incorporating local search capabilities.
Our experiments demonstrate that the proposed method outperforms state-of-the-art gradient-based techniques.
arXiv Detail & Related papers (2024-10-26T17:31:15Z) - Adam-family Methods with Decoupled Weight Decay in Deep Learning [3.4376560669160394]
We investigate the convergence properties of a wide of Adam-family methods for nonsmooth nonsmooth networks.
We propose a novel Adam-family method named Adam with Decoupled Weight Decay (AdamD) in our proposed framework.
arXiv Detail & Related papers (2023-10-13T04:59:44Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - A Novel Noise Injection-based Training Scheme for Better Model
Robustness [9.749718440407811]
Noise injection-based method has been shown to be able to improve the robustness of artificial neural networks.
In this work, we propose a novel noise injection-based training scheme for better model robustness.
Experiment results show that our proposed method achieves a much better performance on adversarial robustness and slightly better performance on original accuracy.
arXiv Detail & Related papers (2023-02-17T02:50:25Z) - Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Practical Convex Formulation of Robust One-hidden-layer Neural Network
Training [12.71266194474117]
We show that the training of a one-hidden-layer, scalar-output fully-connected ReLULU neural network can be reformulated as a finite-dimensional convex program.
We derive a convex optimization approach to efficiently solve the "adversarial training" problem.
Our method can be applied to binary classification and regression, and provides an alternative to the current adversarial training methods.
arXiv Detail & Related papers (2021-05-25T22:06:27Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.