Benchmarking Compositionality with Formal Languages
- URL: http://arxiv.org/abs/2208.08195v3
- Date: Tue, 1 Aug 2023 15:19:55 GMT
- Title: Benchmarking Compositionality with Formal Languages
- Authors: Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan
Cotterell
- Abstract summary: We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data.
By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network.
We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
- Score: 64.09083307778951
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recombining known primitive concepts into larger novel combinations is a
quintessentially human cognitive capability. Whether large neural models in NLP
can acquire this ability while learning from data is an open question. In this
paper, we investigate this problem from the perspective of formal languages. We
use deterministic finite-state transducers to make an unbounded number of
datasets with controllable properties governing compositionality. By randomly
sampling over many transducers, we explore which of their properties contribute
to learnability of a compositional relation by a neural network. We find that
the models either learn the relations completely or not at all. The key is
transition coverage, setting a soft learnability limit at 400 examples per
transition.
Related papers
- Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Modeling rapid language learning by distilling Bayesian priors into
artificial neural networks [18.752638142258668]
We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network.
The resulting system can learn formal linguistic patterns from a small number of examples.
It can also learn aspects of English syntax from a corpus of natural language.
arXiv Detail & Related papers (2023-05-24T04:11:59Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Robust Generalization of Quadratic Neural Networks via Function
Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z) - Understanding Boolean Function Learnability on Deep Neural Networks [0.0]
Computational learning theory states that many classes of formulas are learnable in time.
This paper addresses the understudied subject of how, in practice, such formulas can be learned by deep neural networks.
arXiv Detail & Related papers (2020-09-13T03:49:20Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z) - Compositional Languages Emerge in a Neural Iterated Learning Model [27.495624644227888]
compositionality enables natural language to represent complex concepts via a structured combination of simpler ones.
We propose an effective neural iterated learning (NIL) algorithm that, when applied to interacting neural agents, facilitates the emergence of a more structured type of language.
arXiv Detail & Related papers (2020-02-04T15:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.