Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of Expectation in the Loss Landscape
- URL: http://arxiv.org/abs/2411.16914v1
- Date: Mon, 25 Nov 2024 20:32:57 GMT
- Title: Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of Expectation in the Loss Landscape
- Authors: Jed A. Duersch, Tommie A. Catanach, Alexander Safonov, Jeremy Wendt,
- Abstract summary: We present a new conceptual framework to understand how curvature of expected changes in loss emerges in architectures with many rectified linear units.
Our derivations show how these discontinuities combine to form a glass-like structure, similar to amorphous solids that contain microscopic domains of strong, but random, atomic alignment.
We derive the optimal modification to quasi-Newton steps that incorporate both glass and Hessian terms, as well as certain exactness properties that are possible with Nesterov-accelerated gradient updates.
- Score: 41.94295877935867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Harnessing the local topography of the loss landscape is a central challenge in advanced optimization tasks. By accounting for the effect of potential parameter changes, we can alter the model more efficiently. Contrary to standard assumptions, we find that the Hessian does not always approximate loss curvature well, particularly near gradient discontinuities, which commonly arise in deep learning architectures. We present a new conceptual framework to understand how curvature of expected changes in loss emerges in architectures with many rectified linear units. Each ReLU creates a parameter boundary that, when crossed, induces a pseudorandom gradient perturbation. Our derivations show how these discontinuities combine to form a glass-like structure, similar to amorphous solids that contain microscopic domains of strong, but random, atomic alignment. By estimating the density of the resulting gradient variations, we can bound how the loss may change with parameter movement. Our analysis includes the optimal kernel and sample distribution for approximating glass density from ordinary gradient evaluations. We also derive the optimal modification to quasi-Newton steps that incorporate both glass and Hessian terms, as well as certain exactness properties that are possible with Nesterov-accelerated gradient updates. Our algorithm, Alice, tests these techniques to determine which curvature terms are most impactful for training a given architecture and dataset. Additional safeguards enforce stable exploitation through step bounds that expand on the functionality of Adam. These theoretical and experimental tools lay groundwork to improve future efforts (e.g., pruning and quantization) by providing new insight into the loss landscape.
Related papers
- Data-Driven Adaptive Gradient Recovery for Unstructured Finite Volume Computations [0.0]
We present a novel data-driven approach for enhancing gradient reconstruction in unstructured finite volume methods for hyperbolic conservation laws.<n>Our approach extends previous structured-grid methodologies to unstructured meshes through a modified DeepONet architecture.<n>The proposed algorithm is faster and more accurate than the traditional second-order finite volume solver.
arXiv Detail & Related papers (2025-07-22T13:23:57Z) - Navigating loss manifolds via rigid body dynamics: A promising avenue for robustness and generalisation [11.729464930866483]
Training large neural networks through gradient-based optimization requires navigating high-dimensional loss landscapes.<n>We propose an alternative that simultaneously reduces this dependence, and avoids sharp minima.
arXiv Detail & Related papers (2025-05-26T05:26:21Z) - Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Charting the Topography of the Neural Network Landscape with
Thermal-Like Noise [0.0]
Training neural networks is a complex, high-dimensional, non- quadratic and noisy optimization problem.
We use Langevin dynamics methods to study a classification task on random data network.
We find that it is a low-dimensional dimension can be readily obtained from the fluctuations.
We explain this behavior by a simplified loss model which is analytically tractable and reproduces the observed fluctuation statistics.
arXiv Detail & Related papers (2023-04-03T20:01:52Z) - Are Gradients on Graph Structure Reliable in Gray-box Attacks? [56.346504691615934]
Previous gray-box attackers employ gradients from the surrogate model to locate the vulnerable edges to perturb the graph structure.
In this paper, we discuss and analyze the errors caused by the unreliability of the structural gradients.
We propose a novel attack model with methods to reduce the errors inside the structural gradients.
arXiv Detail & Related papers (2022-08-07T06:43:32Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Learning High-Precision Bounding Box for Rotated Object Detection via
Kullback-Leibler Divergence [100.6913091147422]
Existing rotated object detectors are mostly inherited from the horizontal detection paradigm.
In this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology.
arXiv Detail & Related papers (2021-06-03T14:29:19Z) - Improved Analysis of Clipping Algorithms for Non-convex Optimization [19.507750439784605]
Recently, citetzhang 2019gradient show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD.
Experiments confirm the superiority of clipping-based methods in deep learning tasks.
arXiv Detail & Related papers (2020-10-05T14:36:59Z) - Expectigrad: Fast Stochastic Optimization with Robust Convergence
Properties [18.973116252065278]
We propose a novel method called Expectigrad, which adjusts according to a per-component unweighted mean of all historical momentum term jointly between the numerator and denominator.
We prove that Expectigrad cannot diverge on every instance of gradient optimization problem known to cause Adam to diverge.
arXiv Detail & Related papers (2020-10-03T13:34:27Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.