Related papers: Low-Loss Space in Neural Networks is Continuous and Fully Connected

Low-Loss Space in Neural Networks is Continuous and Fully Connected

URL: http://arxiv.org/abs/2505.02604v3
Date: Wed, 11 Jun 2025 01:28:36 GMT
Title: Low-Loss Space in Neural Networks is Continuous and Fully Connected
Authors: Yongding Tian, Zaid Al-Ars, Maksim Kitsak, Peter Hofstee,
Abstract summary: We show that it is possible to connect two different minima with a path consisting of intermediate points that also have low loss.<n>Our work also provides new visualization methods and opportunities to improve model generalization.
Score: 0.8212195887472242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visualizations of the loss landscape in neural networks suggest that minima are isolated points. However, both theoretical and empirical studies indicate that it is possible to connect two different minima with a path consisting of intermediate points that also have low loss. In this study, we propose a new algorithm which investigates low-loss paths in the full parameter space, not only between two minima. Our experiments on LeNet5, ResNet18, and Compact Convolutional Transformer architectures consistently demonstrate the existence of such continuous paths in the parameter space. These results suggest that the low-loss region is a fully connected and continuous space in the parameter space. Our findings provide theoretical insight into neural network over-parameterization, highlighting that parameters collectively define a high-dimensional low-loss space, implying parameter redundancy exists only within individual models and not throughout the entire low-loss space. Additionally, our work also provides new visualization methods and opportunities to improve model generalization by exploring the low-loss space that is closer to the origin.

Related papers

Simplicity Bias via Global Convergence of Sharpness Minimization [43.658859631741024]
We show that label noise SGD always minimizes the sharpness on the manifold of models with zero loss for two-layer networks. We also find a novel property of the trace of Hessian of the loss at approximate stationary points on the manifold of zero loss.
arXiv Detail & Related papers (2024-10-21T18:10:37Z)
A simple connection from loss flatness to compressed neural representations [3.5502600490147196]
Sharpness, a geometric measure in the parameter space that reflects the flatness of the loss landscape, has long been studied for its potential connections to neural network behavior.<n>In this paper, we investigate how sharpness influences the local geometric features of neural representations in feature space.
arXiv Detail & Related papers (2023-10-03T03:36:29Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks. We generalize observations on small neural networks to more complex systems. An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z)
On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems [0.0]
We study the loss landscape of training problems for deep artificial neural networks with a one-dimensional real output. It is shown that such problems possess a continuum of spurious (i.e., not globally optimal) local minima for all target functions that are not affine.
arXiv Detail & Related papers (2022-02-23T14:41:54Z)
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry [3.712728573432119]
We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
arXiv Detail & Related papers (2022-02-07T09:57:54Z)
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks [16.4654807047138]
We propose an effective regularization technique, called Neighborhood Region Smoothing (NRS) NRS tries to regularize the neighborhood region in weight space to yield approximate outputs. We empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method.
arXiv Detail & Related papers (2022-01-16T15:11:00Z)
InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering [55.70938412352287]
We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation. The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints. We achieve consistently improved performance compared to existing neural view synthesis methods by large margins on multiple standard benchmarks.
arXiv Detail & Related papers (2021-12-31T11:56:01Z)
On Connectivity of Solutions in Deep Learning: The Role of Over-parameterization and Feature Quality [21.13299067136635]
We present a novel condition for ensuring the connectivity of two arbitrary points in parameter space. This condition is provably milder than dropout stability, and it provides a connection between the problem of finding low-loss paths and the memorization capacity of neural nets.
arXiv Detail & Related papers (2021-02-18T23:44:08Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Effective Version Space Reduction for Convolutional Neural Networks [61.84773892603885]
In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. We examine active learning with convolutional neural networks through the principled lens of version space reduction.
arXiv Detail & Related papers (2020-06-22T17:40:03Z)
Avoiding Spurious Local Minima in Deep Quadratic Networks [0.0]
We characterize the landscape of the mean squared nonlinear error for networks with neural activation functions. We prove that deepized neural networks with quadratic activations benefit from similar landscape properties.
arXiv Detail & Related papers (2019-12-31T22:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.