Related papers: Neural network optimization strategies and the topography of the loss landscape

Neural network optimization strategies and the topography of the loss landscape

URL: http://arxiv.org/abs/2602.21276v1
Date: Tue, 24 Feb 2026 17:49:13 GMT
Title: Neural network optimization strategies and the topography of the loss landscape
Authors: Jianneng Yu, Alexandre V. Morozov,
Abstract summary: We investigate neural network learning by gradient descent (SGD)<n>We use several computational tools to investigate neural network parameters obtained by these two optimization methods.
Score: 45.88028371034407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) - a non-convex global optimization algorithm which relies only on the gradient of the objective function. We contrast SGD solutions with those obtained via a non-stochastic quasi-Newton method, which utilizes curvature information to determine step direction and Golden Section Search to choose step size. We use several computational tools to investigate neural network parameters obtained by these two optimization methods, including kernel Principal Component Analysis and a novel, general-purpose algorithm for finding low-height paths between pairs of points on loss or energy landscapes, FourierPathFinder. We find that the choice of the optimizer profoundly affects the nature of the resulting solutions. SGD solutions tend to be separated by lower barriers than quasi-Newton solutions, even if both sets of solutions are regularized by early stopping to ensure adequate performance on test data. When allowed to fit extensively on the training data, quasi-Newton solutions occupy deeper minima on the loss landscapes that are not reached by SGD. These solutions are less generalizable to the test data however. Overall, SGD explores smooth basins of attraction, while quasi-Newton optimization is capable of finding deeper, more isolated minima that are more spread out in the parameter space. Our findings help understand both the topography of the loss landscapes and the fundamental role of landscape exploration strategies in creating robust, transferrable neural network models.

Related papers

Optimizing the Optimizer for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks [3.758814046658822]
Physics-Informed Neural Networks (PINNs) have revolutionized the computation PDE solutions by integrating partialmagnitude equations (PDEs) into the neural network's training process as soft constraints.<n>More, physics-informed networks (PIKANs) have also been effective and comparable in accuracy.
arXiv Detail & Related papers (2025-01-22T21:19:42Z)
PACMANN: Point Adaptive Collocation Method for Artificial Neural Networks [41.99844472131922]
PINNs minimize a loss function which includes the PDE residual determined for a set of collocation points.<n>PACMANN incrementally moves collocation points toward regions of higher residuals using gradient-based optimization algorithms.<n>Key features of the method include its low computational cost and simplicity of integration into existing physics-informed neural network pipelines.
arXiv Detail & Related papers (2024-11-29T11:31:11Z)
Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one. Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function. We study the impact of the location of the collocation points on the trainability of these models. We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z)
Critical Investigation of Failure Modes in Physics-informed Neural Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize. We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z)
Enhanced Exploration in Neural Feature Selection for Deep Click-Through Rate Prediction Models via Ensemble of Gating Layers [7.381829794276824]
The goal of neural feature selection (NFS) is to choose a relatively small subset of features with the best explanatory power. Gating approach inserts a set of differentiable binary gates to drop less informative features. To improve the exploration capacity of gradient-based solutions, we propose a simple but effective ensemble learning approach.
arXiv Detail & Related papers (2021-12-07T04:37:05Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Persistent Neurons [4.061135251278187]
We propose a trajectory-based strategy that optimize the learning task using information from previous solutions. Persistent neurons can be regarded as a method with gradient informed bias where individual updates are corrupted by deterministic error terms. We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures.
arXiv Detail & Related papers (2020-07-02T22:36:49Z)
The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks. We propose a feature distortion method (Disout) for addressing the aforementioned problem. The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.