Learning to Reason with Neural Networks: Generalization, Unseen Data and
Boolean Measures
- URL: http://arxiv.org/abs/2205.13647v1
- Date: Thu, 26 May 2022 21:53:47 GMT
- Title: Learning to Reason with Neural Networks: Generalization, Unseen Data and
Boolean Measures
- Authors: Emmanuel Abbe, Samy Bengio, Elisabetta Cornacchia, Jon Kleinberg, Aryo
Lotfi, Maithra Raghu, Chiyuan Zhang
- Abstract summary: This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a'reasoning' function acts on a string of digits to produce the label.
It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function.
- Score: 44.87247707099189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers the Pointer Value Retrieval (PVR) benchmark introduced
in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce
the label. More generally, the paper considers the learning of logical
functions with gradient descent (GD) on neural networks. It is first shown that
in order to learn logical functions with gradient descent on symmetric neural
networks, the generalization error can be lower-bounded in terms of the
noise-stability of the target function, supporting a conjecture made in
[ZRKB21]. It is then shown that in the distribution shift setting, when the
data withholding corresponds to freezing a single feature (referred to as
canonical holdout), the generalization error of gradient descent admits a tight
characterization in terms of the Boolean influence for several relevant
architectures. This is shown on linear models and supported experimentally on
other models such as MLPs and Transformers. In particular, this puts forward
the hypothesis that for such architectures and for learning logical functions
such as PVR functions, GD tends to have an implicit bias towards low-degree
representations, which in turn gives the Boolean influence for the
generalization error under quadratic loss.
Related papers
- Learning local discrete features in explainable-by-design convolutional neural networks [0.0]
We introduce an explainable-by-design convolutional neural network (CNN) based on the lateral inhibition mechanism.
The model consists of the predictor, that is a high-accuracy CNN with residual or dense skip connections.
By collecting observations and directly calculating probabilities, we can explain causal relationships between motifs of adjacent levels.
arXiv Detail & Related papers (2024-10-31T18:39:41Z) - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks.
We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z) - Accelerated Neural Network Training with Rooted Logistic Objectives [13.400503928962756]
We derive a novel sequence of em strictly convex functions that are at least as strict as logistic loss.
Our results illustrate that training with rooted loss function is converged faster and gains performance improvements.
arXiv Detail & Related papers (2023-10-05T20:49:48Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Random Feature Amplification: Feature Learning and Generalization in
Neural Networks [44.431266188350655]
We provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent.
We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate.
arXiv Detail & Related papers (2022-02-15T18:18:22Z) - Modeling Implicit Bias with Fuzzy Cognitive Maps [0.0]
This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets.
We introduce a new reasoning mechanism equipped with a normalization-like transfer function that prevents neurons from saturating.
arXiv Detail & Related papers (2021-12-23T17:04:12Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.