Investigating Novel Verb Learning in BERT: Selectional Preference
Classes and Alternation-Based Syntactic Generalization
- URL: http://arxiv.org/abs/2011.02417v1
- Date: Wed, 4 Nov 2020 17:17:49 GMT
- Title: Investigating Novel Verb Learning in BERT: Selectional Preference
Classes and Alternation-Based Syntactic Generalization
- Authors: Tristan Thrush, Ethan Wilcox, and Roger Levy
- Abstract summary: We deploy a novel word-learning paradigm to test BERT's few-shot learning capabilities for two aspects of English verbs.
We find that BERT makes robust grammatical generalizations after just one or two instances of a novel word in fine-tuning.
- Score: 22.112988757841467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies investigating the syntactic abilities of deep learning
models have not targeted the relationship between the strength of the
grammatical generalization and the amount of evidence to which the model is
exposed during training. We address this issue by deploying a novel
word-learning paradigm to test BERT's few-shot learning capabilities for two
aspects of English verbs: alternations and classes of selectional preferences.
For the former, we fine-tune BERT on a single frame in a verbal-alternation
pair and ask whether the model expects the novel verb to occur in its sister
frame. For the latter, we fine-tune BERT on an incomplete selectional network
of verbal objects and ask whether it expects unattested but plausible
verb/object pairs. We find that BERT makes robust grammatical generalizations
after just one or two instances of a novel word in fine-tuning. For the verbal
alternation tests, we find that the model displays behavior that is consistent
with a transitivity bias: verbs seen few times are expected to take direct
objects, but verbs seen with direct objects are not expected to occur
intransitively.
Related papers
- Visually Grounded Speech Models have a Mutual Exclusivity Bias [20.495178526318185]
When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias.
This bias has been studied computationally, but only in models that use discrete word representations as input.
We investigate the ME bias in the context of visually grounded speech models that learn from natural images and continuous speech audio.
arXiv Detail & Related papers (2024-03-20T18:49:59Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Suffix Retrieval-Augmented Language Modeling [1.8710230264817358]
Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions.
We propose a novel model that simulates a bi-directional contextual effect in an autoregressive manner.
arXiv Detail & Related papers (2022-11-06T07:53:19Z) - GSRFormer: Grounded Situation Recognition Transformer with Alternate
Semantic Attention Refinement [73.73599110214828]
Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of images for human-like'' event understanding.
Inspired by object detection and image captioning tasks, existing methods typically employ a two-stage framework.
We propose a novel two-stage framework that focuses on utilizing such bidirectional relations within verbs and roles.
arXiv Detail & Related papers (2022-08-18T17:13:59Z) - Noun2Verb: Probabilistic frame semantics for word class conversion [8.939269057094661]
We present a formal framework that simulates the production and comprehension of novel denominal verb usages.
We show that a model where the speaker and listener cooperatively learn the joint distribution over semantic frame elements better explains the empirical denominal verb usages.
arXiv Detail & Related papers (2022-05-12T19:16:12Z) - Word Order Does Matter (And Shuffled Language Models Know It) [9.990431777927421]
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE.
We investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order.
arXiv Detail & Related papers (2022-03-21T14:10:15Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Refining Targeted Syntactic Evaluation of Language Models [6.991281327290524]
Targeted syntactic evaluation of subject-verb number agreement in English (TSE)
Method evaluates whether language models rate each grammatical sentence as more likely than its ungrammatical counterpart.
We find that TSE overestimates systematicity of language models, but that models score up to 40% better on verbs that they predict are likely in context.
arXiv Detail & Related papers (2021-04-19T20:55:13Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Investigating Cross-Linguistic Adjective Ordering Tendencies with a
Latent-Variable Model [66.84264870118723]
We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model.
We provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.
arXiv Detail & Related papers (2020-10-09T18:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.