On the distance between two neural networks and the stability of
learning
- URL: http://arxiv.org/abs/2002.03432v3
- Date: Fri, 8 Jan 2021 13:51:25 GMT
- Title: On the distance between two neural networks and the stability of
learning
- Authors: Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu
- Abstract summary: This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.
The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks.
- Score: 59.62047284234815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper relates parameter distance to gradient breakdown for a broad class
of nonlinear compositional functions. The analysis leads to a new distance
function called deep relative trust and a descent lemma for neural networks.
Since the resulting learning rule seems to require little to no learning rate
tuning, it may unlock a simpler workflow for training deeper and more complex
neural networks. The Python code used in this paper is here:
https://github.com/jxbz/fromage.
Related papers
- LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - On the High Symmetry of Neural Network Functions [0.0]
Training neural networks means solving a high-dimensional optimization problem.
This paper shows how due to how neural networks are designed, the neural network function present a very large symmetry in the parameter space.
arXiv Detail & Related papers (2022-11-12T07:51:14Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Extremal learning: extremizing the output of a neural network in
regression problems [0.0]
We show how to efficiently find extrema of a trained neural network in regression problems.
Finding the extremizing input of an approximated model is formulated as the training of an additional neural network.
arXiv Detail & Related papers (2021-02-06T18:01:17Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.