Related papers: Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

URL: http://arxiv.org/abs/2210.03275v1
Date: Fri, 7 Oct 2022 01:21:05 GMT
Title: Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning
Authors: Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland
Abstract summary: Out-of-distribution generalization is a longstanding challenge for neural networks. We show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
Score: 4.191829617421395
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.

Related papers

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers [10.206921909332006]
This study investigates the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions)
arXiv Detail & Related papers (2025-01-15T02:54:52Z)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z)
INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer [17.10555702634864]
deep reinforcement learning has shown promising results for learning fast routings to solve routing problems. Most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. We propose a novel architecture, called Invariant Nested View Transformer (INViT), which enforces a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver.
arXiv Detail & Related papers (2024-02-04T02:09:30Z)
What Algorithms can Transformers Learn? A Study in Length Generalization [23.970598914609916]
We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Specifically, we leverage RASP -- a programming language designed for the computational model of a Transformer. Our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
arXiv Detail & Related papers (2023-10-24T17:43:29Z)
Generalization and Estimation Error Bounds for Model-based Neural Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks. We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z)
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions. We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition. These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z)
Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice. We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z)
Compositional Generalization and Decomposition in Neural Program Synthesis [59.356261137313275]
In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize. We introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets.
arXiv Detail & Related papers (2022-04-07T22:16:05Z)
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization [8.424405898986118]
We propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing.
arXiv Detail & Related papers (2021-10-14T21:24:27Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
Total Deep Variation: A Stable Regularizer for Inverse Problems [71.90933869570914]
We introduce the data-driven general-purpose total deep variation regularizer. In its core, a convolutional neural network extracts local features on multiple scales and in successive blocks. We achieve state-of-the-art results for numerous imaging tasks.
arXiv Detail & Related papers (2020-06-15T21:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.