Injecting structural hints: Using language models to study inductive
biases in language learning
- URL: http://arxiv.org/abs/2304.13060v2
- Date: Sun, 29 Oct 2023 17:14:06 GMT
- Title: Injecting structural hints: Using language models to study inductive
biases in language learning
- Authors: Isabel Papadimitriou and Dan Jurafsky
- Abstract summary: We inject inductive bias into language models by pretraining on formally-structured data.
We then evaluate the biased learners' ability to learn typologically-diverse natural languages.
We show that non-context-free relationships form the best inductive biases.
- Score: 40.8902073270634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Both humans and large language models are able to learn language without
explicit structural supervision. What inductive biases make this learning
possible? We address this fundamental cognitive question by leveraging
transformer language models: we inject inductive bias into language models by
pretraining on formally-structured data, and then evaluate the biased learners'
ability to learn typologically-diverse natural languages. Our experimental
setup creates a testbed for hypotheses about inductive bias in human language
learning. We investigate the effect of injecting models with three types of
inductive bias: 1) recursive, hierarchical processing, 2) crossing token-token
relationships that can't be modeled by context-free grammars, and 3) a Zipfian
power-law vocabulary distribution. We show that non-context-free relationships
form the best inductive biases. Our study leverages the capabilities of
transformer models to run controlled language learning experiments that are not
possible to run on humans, and surfaces hypotheses about the structures that
facilitate language learning in both humans and machines.
Related papers
- Testing learning hypotheses using neural networks by manipulating learning data [20.525923251193472]
We show that a neural network language model can learn restrictions to the passive that are similar to those displayed by humans.
We find that while the frequency with which a verb appears in the passive significantly affects its passivizability, the semantics of the verb does not.
arXiv Detail & Related papers (2024-07-05T15:41:30Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth.
We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.
We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - What Artificial Neural Networks Can Tell Us About Human Language
Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.
To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z) - Examining the Inductive Bias of Neural Language Models with Artificial
Languages [42.699545862522214]
We propose a novel method for investigating the inductive biases of language models using artificial languages.
This constitutes a fully controlled causal framework, and demonstrates how grammar engineering can serve as a useful tool for analyzing neural models.
arXiv Detail & Related papers (2021-06-02T09:34:32Z) - Universal linguistic inductive biases via meta-learning [36.43388942327124]
It is unclear which inductive biases can explain observed patterns in language acquisition.
We introduce a framework for giving linguistic inductive biases to a neural network model.
We demonstrate this framework with a case study based on syllable structure.
arXiv Detail & Related papers (2020-06-29T19:15:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.