Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature
and PRESupposition
- URL: http://arxiv.org/abs/2004.03066v2
- Date: Tue, 14 Jul 2020 01:17:37 GMT
- Title: Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature
and PRESupposition
- Authors: Paloma Jeretic, Alex Warstadt, Suvrat Bhooshan, Adina Williams
- Abstract summary: Natural language inference (NLI) is an increasingly important task for natural language understanding.
The ability of NLI models to make pragmatic inferences remains understudied.
We evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI learn to make pragmatic inferences.
- Score: 17.642255516887968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language inference (NLI) is an increasingly important task for
natural language understanding, which requires one to infer whether a sentence
entails another. However, the ability of NLI models to make pragmatic
inferences remains understudied. We create an IMPlicature and PRESupposition
diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated
sentence pairs illustrating well-studied pragmatic inference types. We use
IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on
MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although
MultiNLI appears to contain very few pairs illustrating these inference types,
we find that BERT learns to draw pragmatic inferences. It reliably treats
scalar implicatures triggered by "some" as entailments. For some presupposition
triggers like "only", BERT reliably recognizes the presupposition as an
entailment, even when the trigger is embedded under an entailment canceling
operator like negation. BOW and InferSent show weaker evidence of pragmatic
reasoning. We conclude that NLI training encourages models to learn some, but
not all, pragmatic inferences.
Related papers
- Deep Natural Language Feature Learning for Interpretable Prediction [1.6114012813668932]
We propose a method to break down a main complex task into a set of intermediary easier sub-tasks.
Our method allows for representing each example by a vector consisting of the answers to these questions.
We have successfully applied this method to two completely different tasks: detecting incoherence in students' answers to open-ended mathematics exam questions, and screening abstracts for a systematic literature review of scientific papers on climate change and agroecology.
arXiv Detail & Related papers (2023-11-09T21:43:27Z) - All Roads Lead to Rome? Exploring the Invariance of Transformers'
Representations [69.3461199976959]
We propose a model based on invertible neural networks, BERT-INN, to learn the Bijection Hypothesis.
We show the advantage of BERT-INN both theoretically and through extensive experiments.
arXiv Detail & Related papers (2023-05-23T22:30:43Z) - Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text
Correspondence [45.9949173746044]
We show that large-size pre-trained language models (PLMs) do not satisfy the logical negation property (LNP)
We propose a novel intermediate training task, names meaning-matching, designed to directly learn a meaning-text correspondence.
We find that the task enables PLMs to learn lexical semantic information.
arXiv Detail & Related papers (2022-05-08T08:37:36Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - How Does Adversarial Fine-Tuning Benefit BERT? [16.57274211257757]
Adversarial training is one of the most reliable methods for defending against adversarial attacks in machine learning.
We show that adversarially fine-tuned models remain more faithful to BERT's language modeling behavior and are more sensitive to the word order.
Our analysis demonstrates that vanilla fine-tuning oversimplifies the sentence representation by focusing heavily on one or a few label-indicative words.
arXiv Detail & Related papers (2021-08-31T03:39:06Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight
Gated Injection Method [29.352569563032056]
We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into a pre-trained BERT.
Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model.
arXiv Detail & Related papers (2020-10-23T17:00:26Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.