Provable Guarantees for Neural Networks via Gradient Feature Learning
- URL: http://arxiv.org/abs/2310.12408v1
- Date: Thu, 19 Oct 2023 01:45:37 GMT
- Title: Provable Guarantees for Neural Networks via Gradient Feature Learning
- Authors: Zhenmei Shi, Junyi Wei, Yingyu Liang
- Abstract summary: This work proposes a unified analysis framework for two-layer networks trained by gradient descent.
The framework is centered around the principle of feature learning from prototypical gradients, and its effectiveness is demonstrated by applications in several problems.
- Score: 15.413985018920018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have achieved remarkable empirical performance, while the
current theoretical analysis is not adequate for understanding their success,
e.g., the Neural Tangent Kernel approach fails to capture their key feature
learning ability, while recent analyses on feature learning are typically
problem-specific. This work proposes a unified analysis framework for two-layer
networks trained by gradient descent. The framework is centered around the
principle of feature learning from gradients, and its effectiveness is
demonstrated by applications in several prototypical problems, such as mixtures
of Gaussians and parity functions. The framework also sheds light on
interesting network learning phenomena such as feature learning beyond kernels
and the lottery ticket hypothesis.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks [18.809547338077905]
The ability of learning useful features is one of the major advantages of neural networks.
Recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning.
arXiv Detail & Related papers (2024-06-03T20:15:28Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network.
We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z) - Statistical Mechanical Analysis of Catastrophic Forgetting in Continual
Learning with Teacher and Student Networks [5.209145866174911]
When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences.
We provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning.
We find that the network can avoid catastrophic forgetting when the similarity among input distributions is small and that of the input-output relationship of the target functions is large.
arXiv Detail & Related papers (2021-05-16T09:02:48Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks.
We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.