Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
- URL: http://arxiv.org/abs/2506.14397v2
- Date: Wed, 18 Jun 2025 01:18:11 GMT
- Title: Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
- Authors: Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, JiA Kang, Sangho Kim, Jaejin Lee,
- Abstract summary: Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models.<n>Existing benchmarks often treat negation as a side case within broader tasks like natural language inference.<n>We introduce Thunder-NUBench, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs.
- Score: 4.9301587184653295
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Existing benchmarks often treat negation as a side case within broader tasks like natural language inference, resulting in a lack of benchmarks that exclusively target negation understanding. In this work, we introduce Thunder-NUBench, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs. Thunder-NUBench goes beyond surface-level cue detection by contrasting standard negation with structurally diverse alternatives such as local negation, contradiction, and paraphrase. The benchmark consists of manually curated sentence-negation pairs and a multiple-choice dataset that enables in-depth evaluation of models' negation understanding.
Related papers
- Vision-Language Models Do Not Understand Negation [50.27667000027403]
NegBench is a benchmark designed to evaluate negation understanding across 18 task variations and $79$k examples.<n>We show that this approach can result in a 10% increase in recall on negated queries and a 28% boost in accuracy on multiple-choice questions with negated captions.
arXiv Detail & Related papers (2025-01-16T09:55:42Z) - Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.