Generalization on the Unseen, Logic Reasoning and Degree Curriculum
- URL: http://arxiv.org/abs/2301.13105v2
- Date: Wed, 28 Jun 2023 15:41:49 GMT
- Title: Generalization on the Unseen, Logic Reasoning and Degree Curriculum
- Authors: Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk
- Abstract summary: This paper considers the learning of logical functions with focus on the generalization on the unseen (GOTU) setting.
We study how different network architectures trained by (S)GD perform under GOTU.
We provide evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen.
- Score: 33.777993397106584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers the learning of logical (Boolean) functions with focus
on the generalization on the unseen (GOTU) setting, a strong case of
out-of-distribution generalization. This is motivated by the fact that the rich
combinatorial nature of data in certain reasoning tasks (e.g.,
arithmetic/logic) makes representative data sampling challenging, and learning
successfully under GOTU gives a first vignette of an 'extrapolating' or
'reasoning' learner. We then study how different network architectures trained
by (S)GD perform under GOTU and provide both theoretical and experimental
evidence that for a class of network models including instances of
Transformers, random features models, and diagonal linear networks, a
min-degree-interpolator is learned on the unseen. We also provide evidence that
other instances with larger learning rates or mean-field networks reach leaky
min-degree solutions. These findings lead to two implications: (1) we provide
an explanation to the length generalization problem (e.g., Anil et al. 2022);
(2) we introduce a curriculum learning algorithm called Degree-Curriculum that
learns monomials more efficiently by incrementing supports.
Related papers
- What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a complex reasoning schema over KG upon large language models (LLMs)
We augment the arbitrary first-order logical queries via binary tree decomposition to stimulate the reasoning capability of LLMs.
Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - A Model of One-Shot Generalization [6.155604731137828]
One-shot generalization refers to the ability of an algorithm to perform transfer learning within a single task.
We show that the most direct neural network architecture for our data model performs one-shot generalization almost perfectly.
arXiv Detail & Related papers (2022-05-29T01:41:29Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Centric Wireless Resource Allocation for Edge Computing:
Algorithm and Experiment [15.577056429740951]
Edge intelligence is an emerging network architecture that integrates sensing, communication, computing components, and supports various machine learning applications.
Existing methods ignore two important facts: 1) different models have heterogeneous demands on training data; 2) there is a mismatch between the simulated environment and the real-world environment.
This paper proposes the learning centric wireless resource allocation scheme that maximizes the worst learning performance of multiple tasks.
arXiv Detail & Related papers (2020-10-29T06:20:40Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.