Understanding Robust Generalization in Learning Regular Languages
- URL: http://arxiv.org/abs/2202.09717v1
- Date: Sun, 20 Feb 2022 02:50:09 GMT
- Title: Understanding Robust Generalization in Learning Regular Languages
- Authors: Soham Dan and Osbert Bastani and Dan Roth
- Abstract summary: We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
- Score: 85.95124524975202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key feature of human intelligence is the ability to generalize beyond the
training distribution, for instance, parsing longer sentences than seen in the
past. Currently, deep neural networks struggle to generalize robustly to such
shifts in the data distribution. We study robust generalization in the context
of using recurrent neural networks (RNNs) to learn regular languages. We
hypothesize that standard end-to-end modeling strategies cannot generalize well
to systematic distribution shifts and propose a compositional strategy to
address this. We compare an end-to-end strategy that maps strings to labels
with a compositional strategy that predicts the structure of the deterministic
finite-state automaton (DFA) that accepts the regular language. We
theoretically prove that the compositional strategy generalizes significantly
better than the end-to-end strategy. In our experiments, we implement the
compositional strategy via an auxiliary task where the goal is to predict the
intermediate states visited by the DFA when parsing a string. Our empirical
results support our hypothesis, showing that auxiliary tasks can enable robust
generalization. Interestingly, the end-to-end RNN generalizes significantly
better than the theoretical lower bound, suggesting that it is able to achieve
at least some degree of robust generalization.
Related papers
- Structural generalization in COGS: Supertagging is (almost) all you need [12.991247861348048]
Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required.
We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
arXiv Detail & Related papers (2023-10-21T21:51:25Z) - Learning Symbolic Rules over Abstract Meaning Representations for
Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies.
Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z) - Towards Understanding the Generalization of Graph Neural Networks [9.217947432437546]
Graph neural networks (GNNs) are the most widely adopted model in graph-structured data oriented learning and representation.
We first establish high probability bounds of generalization gap and gradients in transductive learning.
The theoretical results reveal the architecture specific factors affecting the generalization gap.
arXiv Detail & Related papers (2023-05-14T03:05:14Z) - Real-World Compositional Generalization with Disentangled
Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability.
We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency.
Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z) - Generalization Through the Lens of Learning Dynamics [11.009483845261958]
A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment.
The impressive generalization performance of deep neural networks has stymied theoreticians.
This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
arXiv Detail & Related papers (2022-12-11T00:07:24Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Disentangled Sequence to Sequence Learning for Compositional
Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input.
Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Estimating the Generalization in Deep Neural Networks via Sparsity [15.986873241115651]
We propose a novel method for estimating the generalization gap based on network sparsity.
By training DNNs with a wide range of generalization gap on popular datasets, we show that our key quantities and linear model could be efficient tools for estimating the generalization gap of DNNs.
arXiv Detail & Related papers (2021-04-02T02:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.