Out-of-Distribution Generalization in Algorithmic Reasoning Through
Curriculum Learning
- URL: http://arxiv.org/abs/2210.03275v1
- Date: Fri, 7 Oct 2022 01:21:05 GMT
- Title: Out-of-Distribution Generalization in Algorithmic Reasoning Through
Curriculum Learning
- Authors: Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland
- Abstract summary: Out-of-distribution generalization is a longstanding challenge for neural networks.
We show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
- Score: 4.191829617421395
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for
neural networks, and is quite apparent in tasks with well-defined variables and
rules, where explicit use of the rules can solve problems independently of the
particular values of the variables. Large transformer-based language models
have pushed the boundaries on how well neural networks can generalize to novel
inputs, but their complexity obfuscates they achieve such robustness. As a step
toward understanding how transformer-based systems generalize, we explore the
question of OODG in smaller scale transformers. Using a reasoning task based on
the puzzle Sudoku, we show that OODG can occur on complex problems if the
training set includes examples sampled from the whole distribution of simpler
component tasks.
Related papers
- In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference.
This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z) - INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer [17.10555702634864]
deep reinforcement learning has shown promising results for learning fast routings to solve routing problems.
Most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales.
We propose a novel architecture, called Invariant Nested View Transformer (INViT), which enforces a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver.
arXiv Detail & Related papers (2024-02-04T02:09:30Z) - What Algorithms can Transformers Learn? A Study in Length Generalization [23.970598914609916]
We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks.
Specifically, we leverage RASP -- a programming language designed for the computational model of a Transformer.
Our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
arXiv Detail & Related papers (2023-10-24T17:43:29Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Systematic Generalization and Emergent Structures in Transformers
Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions.
We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition.
These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Compositional Generalization and Decomposition in Neural Program
Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize.
We first characterize several different axes along which program synthesis methods would be desired to generalize.
We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z) - The Neural Data Router: Adaptive Control Flow in Transformers Improves
Systematic Generalization [8.424405898986118]
We propose two modifications to the Transformer architecture, copy gate and geometric attention.
Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task.
NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing.
arXiv Detail & Related papers (2021-10-14T21:24:27Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Total Deep Variation: A Stable Regularizer for Inverse Problems [71.90933869570914]
We introduce the data-driven general-purpose total deep variation regularizer.
In its core, a convolutional neural network extracts local features on multiple scales and in successive blocks.
We achieve state-of-the-art results for numerous imaging tasks.
arXiv Detail & Related papers (2020-06-15T21:54:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.