Benchmarking Neural Network Generalization for Grammar Induction
- URL: http://arxiv.org/abs/2308.08253v2
- Date: Fri, 25 Aug 2023 13:40:31 GMT
- Title: Benchmarking Neural Network Generalization for Grammar Induction
- Authors: Nur Lan, Emmanuel Chemla, Roni Katzir
- Abstract summary: We provide a measure of neural network generalization based on fully specified formal languages.
The benchmark includes languages such as $anbn$, $anbncn$, $anbmcn+m$, and Dyck-1 and 2.
- Score: 3.2228025627337864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How well do neural networks generalize? Even for grammar induction tasks,
where the target generalization is fully known, previous works have left the
question open, testing very limited ranges beyond the training set and using
different success criteria. We provide a measure of neural network
generalization based on fully specified formal languages. Given a model and a
formal grammar, the method assigns a generalization score representing how well
a model generalizes to unseen samples in inverse relation to the amount of data
it was trained on. The benchmark includes languages such as $a^nb^n$,
$a^nb^nc^n$, $a^nb^mc^{n+m}$, and Dyck-1 and 2. We evaluate selected
architectures using the benchmark and find that networks trained with a Minimum
Description Length objective (MDL) generalize better and using less data than
networks trained using standard loss functions. The benchmark is available at
https://github.com/taucompling/bliss.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - On the optimization and generalization of overparameterized implicit
neural networks [25.237054775800164]
Implicit neural networks have become increasingly attractive in the machine learning community.
We show that global convergence is guaranteed, even if only the implicit layer is trained.
This paper investigates the generalization error for implicit neural networks.
arXiv Detail & Related papers (2022-09-30T16:19:46Z) - Why Robust Generalization in Deep Learning is Difficult: Perspective of
Expressive Power [15.210336733607488]
We show that for binary classification problems with well-separated data, there exists a constant robust generalization gap unless the size of the neural network is exponential.
We establish an improved upper bound of $exp(mathcalO(k))$ for the network size to achieve low robust generalization error.
arXiv Detail & Related papers (2022-05-27T09:53:04Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Sequence-to-Sequence Learning with Latent Neural Grammars [12.624691611049341]
Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks.
While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization.
This work explores an alternative, hierarchical approach to sequence-to-sequence learning with quasi-synchronous grammars.
arXiv Detail & Related papers (2021-09-02T17:58:08Z) - Estimating the Generalization in Deep Neural Networks via Sparsity [15.986873241115651]
We propose a novel method for estimating the generalization gap based on network sparsity.
By training DNNs with a wide range of generalization gap on popular datasets, we show that our key quantities and linear model could be efficient tools for estimating the generalization gap of DNNs.
arXiv Detail & Related papers (2021-04-02T02:10:32Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z) - Towards GAN Benchmarks Which Require Generalization [48.075521136623564]
We argue that estimating the function must require a large sample from the model.
We turn to neural network divergences (NNDs) which are defined in terms of a neural network trained to distinguish between distributions.
The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples.
arXiv Detail & Related papers (2020-01-10T20:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.