Feedback Gradient Descent: Efficient and Stable Optimization with
Orthogonality for DNNs
- URL: http://arxiv.org/abs/2205.08385v1
- Date: Thu, 12 May 2022 03:47:27 GMT
- Title: Feedback Gradient Descent: Efficient and Stable Optimization with
Orthogonality for DNNs
- Authors: Fanchen Bu, Dong Eui Chang
- Abstract summary: We propose a novel method, named Feedback Gradient Descent (FGD), to our knowledge, the first work showing high efficiency and stability simultaneously.
In the extensive image classification experiments, FGD comprehensively outperforms the existing state-of-the-art methods in terms of accuracy, efficiency, and stability.
- Score: 3.42658286826597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The optimization with orthogonality has been shown useful in training deep
neural networks (DNNs). To impose orthogonality on DNNs, both computational
efficiency and stability are important. However, existing methods utilizing
Riemannian optimization or hard constraints can only ensure stability while
those using soft constraints can only improve efficiency. In this paper, we
propose a novel method, named Feedback Gradient Descent (FGD), to our
knowledge, the first work showing high efficiency and stability simultaneously.
FGD induces orthogonality based on the simple yet indispensable Euler
discretization of a continuous-time dynamical system on the tangent bundle of
the Stiefel manifold. In particular, inspired by a numerical integration method
on manifolds called Feedback Integrators, we propose to instantiate it on the
tangent bundle of the Stiefel manifold for the first time. In the extensive
image classification experiments, FGD comprehensively outperforms the existing
state-of-the-art methods in terms of accuracy, efficiency, and stability.
Related papers
- (Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number [0.0]
We introduce a novel regularizer that is provably differentiable almost everywhere.
We show the advantages of this approach for noisy classification and denoising of MNIST images.
arXiv Detail & Related papers (2024-09-30T19:18:15Z) - Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization [0.0]
We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization.
The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum.
arXiv Detail & Related papers (2024-05-02T21:04:20Z) - Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models
via Reparameterisation and Smoothing [1.6114012813668932]
We introduce a simple framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings.
Our main contribution is a novel variant of SGD, Diagonalisation Gradient Descent, which progressively enhances the accuracy of the smoothed approximation.
Our approach is simple, fast stable and attains orders of magnitude reduction in work-normalised variance.
arXiv Detail & Related papers (2024-02-19T00:43:22Z) - Achieving Constraints in Neural Networks: A Stochastic Augmented
Lagrangian Approach [49.1574468325115]
Regularizing Deep Neural Networks (DNNs) is essential for improving generalizability and preventing overfitting.
We propose a novel approach to DNN regularization by framing the training process as a constrained optimization problem.
We employ the Augmented Lagrangian (SAL) method to achieve a more flexible and efficient regularization mechanism.
arXiv Detail & Related papers (2023-10-25T13:55:35Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory [79.50644650795012]
We propose a deep learning approach to solve Kohn-Sham Density Functional Theory (KS-DFT)
We prove that such an approach has the same expressivity as the SCF method, yet reduces the computational complexity.
In addition, we show that our approach enables us to explore more complex neural-based wave functions.
arXiv Detail & Related papers (2023-03-01T10:38:10Z) - Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent
Flows [4.817429789586127]
We introduce a Poly-based optimization framework for achieving acceleration, based on the notion of fixed-time stability dynamical systems.
We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms.
arXiv Detail & Related papers (2021-12-02T16:04:40Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.