Exact Stochastic Second Order Deep Learning
- URL: http://arxiv.org/abs/2104.03804v1
- Date: Thu, 8 Apr 2021 14:29:31 GMT
- Title: Exact Stochastic Second Order Deep Learning
- Authors: Fares B. Mehouachi, Chaouki Kasmi
- Abstract summary: Deep Learning is mainly dominated by first-order methods which are built around the central concept of backpropagation.
Second-order methods, take into account second-order derivatives less used than first-order methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Optimization in Deep Learning is mainly dominated by first-order methods
which are built around the central concept of backpropagation. Second-order
optimization methods, which take into account the second-order derivatives are
far less used despite superior theoretical properties. This inadequacy of
second-order methods stems from its exorbitant computational cost, poor
performance, and the ineluctable non-convex nature of Deep Learning. Several
attempts were made to resolve the inadequacy of second-order optimization
without reaching a cost-effective solution, much less an exact solution. In
this work, we show that this long-standing problem in Deep Learning could be
solved in the stochastic case, given a suitable regularization of the neural
network. Interestingly, we provide an expression of the stochastic Hessian and
its exact eigenvalues. We provide a closed-form formula for the exact
stochastic second-order Newton direction, we solve the non-convexity issue and
adjust our exact solution to favor flat minima through regularization and
spectral adjustment. We test our exact stochastic second-order method on
popular datasets and reveal its adequacy for Deep Learning.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic
Algorithm [77.53604156112144]
We develop a first-order term to bridge the original problem and discretization algorithm.
Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable.
arXiv Detail & Related papers (2023-10-19T13:57:38Z) - A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs [1.0703175070560689]
We first transform the problem into a Stackelberg differential game problem (leader-follower problem)
We compute two examples of the investment-consumption problem solved through utility models.
The results of both examples demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-04-12T13:31:19Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Adaptive First- and Second-Order Algorithms for Large-Scale Machine
Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning.
In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods.
In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z) - Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order
Information [37.70729542263343]
We present a novel adaptive optimization algorithm for large-scale machine learning problems.
Our method dynamically adapts the direction and step-size.
Our methodology does not require a tedious tuning rate tuning.
arXiv Detail & Related papers (2021-09-11T06:39:50Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - Second-order Neural Network Training Using Complex-step Directional
Derivative [41.4333906662624]
We introduce a numerical algorithm for second-order neural network training.
We tackle the practical obstacle of Hessian calculation by using the complex-step finite difference.
We believe our method will inspire a wide-range of new algorithms for deep learning and numerical optimization.
arXiv Detail & Related papers (2020-09-15T13:46:57Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.