Related papers: Benchmarking Compositionality with Formal Languages

Benchmarking Compositionality with Formal Languages

URL: http://arxiv.org/abs/2208.08195v3
Date: Tue, 1 Aug 2023 15:19:55 GMT
Title: Benchmarking Compositionality with Formal Languages
Authors: Josef Valvoda, Naomi Saphra, Jonathan Rawski, Adina Williams, Ryan Cotterell
Abstract summary: We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
Score: 64.09083307778951
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recombining known primitive concepts into larger novel combinations is a quintessentially human cognitive capability. Whether large neural models in NLP can acquire this ability while learning from data is an open question. In this paper, we investigate this problem from the perspective of formal languages. We use deterministic finite-state transducers to make an unbounded number of datasets with controllable properties governing compositionality. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.

Related papers

Scaling of learning time for high dimensional inputs [0.0]
In neural network models, model complexity grows with the number of inputs to each neuron.<n>A precise characterization of this trade-off would help explain the connectivity and learning times observed in artificial and biological networks.
arXiv Detail & Related papers (2026-03-01T16:51:18Z)
On Relation-Specific Neurons in Large Language Models [55.38860695310484]
In large language models, certain emphneurons can store distinct pieces of knowledge learned during pretraining.<n>We study the LLama-2 family on a chosen set of relations, with a textitstatistics-based method.<n>Our experiments demonstrate the existence of relation-specific neurons.
arXiv Detail & Related papers (2025-02-24T17:33:18Z)
Explainable Neural Networks with Guarantees: A Sparse Estimation Approach [11.142723510517778]
This paper introduces a novel approach to constructing an explainable neural network that harmonizes predictiveness and explainability. Our model, termed SparXnet, is designed as a linear combination of a sparse set of jointly learned features. Our research paves the way for further research on sparse and explainable neural networks with guarantee.
arXiv Detail & Related papers (2025-01-02T12:10:17Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Memorization With Neural Nets: Going Beyond the Worst Case [5.03863830033243]
In practice, deep neural networks are often able to easily interpolate their training data. We introduce a simple randomized algorithm that constructs an interpolating three-layer neural network in time. We obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds.
arXiv Detail & Related papers (2023-09-30T10:06:05Z)
Modeling rapid language learning by distilling Bayesian priors into artificial neural networks [18.752638142258668]
We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network. The resulting system can learn formal linguistic patterns from a small number of examples. It can also learn aspects of English syntax from a corpus of natural language.
arXiv Detail & Related papers (2023-05-24T04:11:59Z)
Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
Robust Generalization of Quadratic Neural Networks via Function Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution. We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z)
Compositional Processing Emerges in Neural Networks Solving Math Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed. Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)
Understanding Boolean Function Learnability on Deep Neural Networks [0.0]
Computational learning theory states that many classes of formulas are learnable in time. This paper addresses the understudied subject of how, in practice, such formulas can be learned by deep neural networks.
arXiv Detail & Related papers (2020-09-13T03:49:20Z)
Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts. We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
Compositional Languages Emerge in a Neural Iterated Learning Model [27.495624644227888]
compositionality enables natural language to represent complex concepts via a structured combination of simpler ones. We propose an effective neural iterated learning (NIL) algorithm that, when applied to interacting neural agents, facilitates the emergence of a more structured type of language.
arXiv Detail & Related papers (2020-02-04T15:19:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.