Related papers: ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

URL: http://arxiv.org/abs/2305.19426v1
Date: Tue, 30 May 2023 21:43:11 GMT
Title: ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
Authors: Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger
Abstract summary: We use ScoNe-NLI to assess fine-tuning and in-context learning strategies. For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful. We extend ScoNe with ScoNe-NLG, a sentence completion test set that embeds negation reasoning in short narratives.
Score: 28.89678790858097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A number of recent benchmarks seek to assess how well models handle natural language negation. However, these benchmarks lack the controlled example paradigms that would allow us to infer whether a model had learned how negation morphemes semantically scope. To fill these analytical gaps, we present the Scoped Negation NLI (ScoNe-NLI) benchmark, which contains contrast sets of six examples with up to two negations where either zero, one, or both negative morphemes affect the NLI label. We use ScoNe-NLI to assess fine-tuning and in-context learning strategies. We find that RoBERTa and DeBERTa models solve ScoNe-NLI after many shot fine-tuning. For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning. To better understand this result, we extend ScoNe with ScoNe-NLG, a sentence completion test set that embeds negation reasoning in short narratives. Here, InstructGPT is successful, which reveals the model can correctly reason about negation, but struggles to do so on prompt-adapted NLI examples outside of its core pretraining regime.

Related papers

Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding [4.9301587184653295]
Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models.<n>Existing benchmarks often treat negation as a side case within broader tasks like natural language inference.<n>We introduce Thunder-NUBench, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs.
arXiv Detail & Related papers (2025-06-17T10:51:39Z)
Vision-Language Models Do Not Understand Negation [50.27667000027403]
NegBench is a benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets. We show that this approach can result in a 10% increase in recall on negated queries and a 40% boost in accuracy on multiple-choice questions with negated captions.
arXiv Detail & Related papers (2025-01-16T09:55:42Z)
Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs) We conduct experiments using LLMs with different subword tokenization methods. We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z)
No Strong Feelings One Way or Another: Re-operationalizing Neutrality in Natural Language Inference [6.485890157501745]
Natural Language Inference (NLI) has been a cornerstone task in evaluating language models' inferential reasoning capabilities. Standard three-way classification scheme used in NLI has well-known shortcomings in evaluating models' ability to capture the nuances of natural human reasoning. We argue that the operationalization of the neutral label in current NLI datasets has low validity, is interpreted inconsistently, and that at least one important sense of neutrality is often ignored.
arXiv Detail & Related papers (2023-06-16T15:45:08Z)
Language models are not naysayers: An analysis of language models on negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation. We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z)
Can large language models generate salient negative statements? [18.577880767789097]
We examine the ability of large language models to generate salient (interesting) negative statements about real-world entities. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation. We measure the correctness and salience of the generated lists about subjects from different domains.
arXiv Detail & Related papers (2023-05-26T09:13:59Z)
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z)
Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks. Recent work has shown that state-of-the-art NLP models underperform on samples containing negation. We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z)
Investigating the Role of Negatives in Contrastive Representation Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning. We focus on disambiguating the role of one of these parameters: the number of negative examples. We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z)
Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language. We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences. We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z)
An Analysis of the Utility of Explicit Negative Examples to Improve the Syntactic Abilities of Neural Language Models [32.183409062294466]
We explore the utilities of explicit negative examples in training neural language models. We find that even with our direct learning signals the models still suffer from resolving agreement across an object-relative clause.
arXiv Detail & Related papers (2020-04-06T07:47:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.