Charting the Topography of the Neural Network Landscape with
Thermal-Like Noise
- URL: http://arxiv.org/abs/2304.01335v2
- Date: Tue, 18 Apr 2023 06:25:31 GMT
- Title: Charting the Topography of the Neural Network Landscape with
Thermal-Like Noise
- Authors: Theo Jules, Gal Brener, Tal Kachman, Noam Levi, Yohai Bar-Sinai
- Abstract summary: Training neural networks is a complex, high-dimensional, non- quadratic and noisy optimization problem.
We use Langevin dynamics methods to study a classification task on random data network.
We find that it is a low-dimensional dimension can be readily obtained from the fluctuations.
We explain this behavior by a simplified loss model which is analytically tractable and reproduces the observed fluctuation statistics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The training of neural networks is a complex, high-dimensional, non-convex
and noisy optimization problem whose theoretical understanding is interesting
both from an applicative perspective and for fundamental reasons. A core
challenge is to understand the geometry and topography of the landscape that
guides the optimization. In this work, we employ standard Statistical Mechanics
methods, namely, phase-space exploration using Langevin dynamics, to study this
landscape for an over-parameterized fully connected network performing a
classification task on random data. Analyzing the fluctuation statistics, in
analogy to thermal dynamics at a constant temperature, we infer a clear
geometric description of the low-loss region. We find that it is a
low-dimensional manifold whose dimension can be readily obtained from the
fluctuations. Furthermore, this dimension is controlled by the number of data
points that reside near the classification decision boundary. Importantly, we
find that a quadratic approximation of the loss near the minimum is
fundamentally inadequate due to the exponential nature of the decision boundary
and the flatness of the low-loss region. This causes the dynamics to sample
regions with higher curvature at higher temperatures, while producing
quadratic-like statistics at any given temperature. We explain this behavior by
a simplified loss model which is analytically tractable and reproduces the
observed fluctuation statistics.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Information-Theoretic Thresholds for Planted Dense Cycles [52.076657911275525]
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences.
For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $tau$, and an edge-wise signal-to-noise ratio $lambda$.
arXiv Detail & Related papers (2024-02-01T03:39:01Z) - On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural
Networks with Linear Activations [0.0]
We investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation.
We show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized.
arXiv Detail & Related papers (2023-05-17T02:26:34Z) - Dynamic Causal Explanation Based Diffusion-Variational Graph Neural
Network for Spatio-temporal Forecasting [60.03169701753824]
We propose a novel Dynamic Diffusion-al Graph Neural Network (DVGNN) fortemporal forecasting.
The proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding Root Mean Squared Error result.
arXiv Detail & Related papers (2023-05-16T11:38:19Z) - A physics and data co-driven surrogate modeling approach for temperature
field prediction on irregular geometric domain [12.264200001067797]
We propose a novel physics and data co-driven surrogate modeling method for temperature field prediction.
Numerical results demonstrate that our method can significantly improve accuracy prediction on a smaller dataset.
arXiv Detail & Related papers (2022-03-15T08:43:24Z) - Physics-informed Convolutional Neural Networks for Temperature Field
Prediction of Heat Source Layout without Labeled Data [9.71214034180507]
This paper develops a physics-informed convolutional neural network (CNN) for the thermal simulation surrogate.
The network can learn a mapping from heat source layout to the steady-state temperature field without labeled data, which equals solving an entire family of partial difference equations (PDEs)
arXiv Detail & Related papers (2021-09-26T03:24:23Z) - The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer
Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data.
We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z) - Learning the structure of wind: A data-driven nonlocal turbulence model
for the atmospheric boundary layer [0.0]
We develop a novel data-driven approach to modeling the atmospheric boundary layer.
This approach leads to a nonlocal, anisotropic synthetic turbulence model which we refer to as the deep rapid distortion (DRD) model.
arXiv Detail & Related papers (2021-07-23T06:41:33Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Towards Deeper Graph Neural Networks [63.46470695525957]
Graph convolutions perform neighborhood aggregation and represent one of the most important graph operations.
Several recent studies attribute this performance deterioration to the over-smoothing issue.
We propose Deep Adaptive Graph Neural Network (DAGNN) to adaptively incorporate information from large receptive fields.
arXiv Detail & Related papers (2020-07-18T01:11:14Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.