Related papers: Understanding Robust Generalization in Learning Regular Languages

Understanding Robust Generalization in Learning Regular Languages

URL: http://arxiv.org/abs/2202.09717v1
Date: Sun, 20 Feb 2022 02:50:09 GMT
Title: Understanding Robust Generalization in Learning Regular Languages
Authors: Soham Dan and Osbert Bastani and Dan Roth
Abstract summary: We study robust generalization in the context of using recurrent neural networks to learn regular languages. We propose a compositional strategy to address this. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
Score: 85.95124524975202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key feature of human intelligence is the ability to generalize beyond the training distribution, for instance, parsing longer sentences than seen in the past. Currently, deep neural networks struggle to generalize robustly to such shifts in the data distribution. We study robust generalization in the context of using recurrent neural networks (RNNs) to learn regular languages. We hypothesize that standard end-to-end modeling strategies cannot generalize well to systematic distribution shifts and propose a compositional strategy to address this. We compare an end-to-end strategy that maps strings to labels with a compositional strategy that predicts the structure of the deterministic finite-state automaton (DFA) that accepts the regular language. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy. In our experiments, we implement the compositional strategy via an auxiliary task where the goal is to predict the intermediate states visited by the DFA when parsing a string. Our empirical results support our hypothesis, showing that auxiliary tasks can enable robust generalization. Interestingly, the end-to-end RNN generalizes significantly better than the theoretical lower bound, suggesting that it is able to achieve at least some degree of robust generalization.

Related papers

In-Context Learning Strategies Emerge Rationally [36.21854069812912]
We show that when trained to learn a mixture of tasks, strategies learned by a model for performing ICL can be captured by a family of Bayesian predictors.<n>We develop a hierarchical Bayesian framework that almost perfectly predicts Transformer next-token predictions throughout training.<n>Our work advances an explanatory and predictive account of ICL grounded in tradeoffs between strategy loss and complexity.
arXiv Detail & Related papers (2025-06-21T23:49:08Z)
Structural generalization in COGS: Supertagging is (almost) all you need [12.991247861348048]
Several recent semantic parsing datasets have put forward important limitations of neural networks in cases where compositional generalization is required. We extend a neural graph-based semantic parsing framework in several ways to alleviate this issue.
arXiv Detail & Related papers (2023-10-21T21:51:25Z)
Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies. Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z)
Towards Understanding the Generalization of Graph Neural Networks [9.217947432437546]
Graph neural networks (GNNs) are the most widely adopted model in graph-structured data oriented learning and representation. We first establish high probability bounds of generalization gap and gradients in transductive learning. The theoretical results reveal the architecture specific factors affecting the generalization gap.
arXiv Detail & Related papers (2023-05-14T03:05:14Z)
Real-World Compositional Generalization with Disentangled Sequence-to-Sequence Learning [81.24269148865555]
A recently proposed Disentangled sequence-to-sequence model (Dangle) shows promising generalization capability. We introduce two key modifications to this model which encourage more disentangled representations and improve its compute and memory efficiency. Specifically, instead of adaptively re-encoding source keys and values at each time step, we disentangle their representations and only re-encode keys periodically.
arXiv Detail & Related papers (2022-12-12T15:40:30Z)
Generalization Through the Lens of Learning Dynamics [11.009483845261958]
A machine learning (ML) system must learn to generalize to novel situations in order to yield accurate predictions at deployment. The impressive generalization performance of deep neural networks has stymied theoreticians. This thesis will study the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
arXiv Detail & Related papers (2022-12-11T00:07:24Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice. We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z)
Disentangled Sequence to Sequence Learning for Compositional Generalization [62.954842223732435]
We propose an extension to sequence-to-sequence models which allows us to learn disentangled representations by adaptively re-encoding the source input. Experimental results on semantic parsing and machine translation empirically show that our proposal yields more disentangled representations and better generalization.
arXiv Detail & Related papers (2021-10-09T22:27:19Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Estimating the Generalization in Deep Neural Networks via Sparsity [15.986873241115651]
We propose a novel method for estimating the generalization gap based on network sparsity. By training DNNs with a wide range of generalization gap on popular datasets, we show that our key quantities and linear model could be efficient tools for estimating the generalization gap of DNNs.
arXiv Detail & Related papers (2021-04-02T02:10:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.