Vision-Language Models Do Not Understand Negation
- URL: http://arxiv.org/abs/2501.09425v1
- Date: Thu, 16 Jan 2025 09:55:42 GMT
- Title: Vision-Language Models Do Not Understand Negation
- Authors: Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip Torr, Yoon Kim, Marzyeh Ghassemi,
- Abstract summary: NegBench is a benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets.
We show that this approach can result in a 10% increase in recall on negated queries and a 40% boost in accuracy on multiple-choice questions with negated captions.
- Score: 50.27667000027403
- License:
- Abstract: Many practical vision-language applications require models that understand negation, e.g., when using natural language to retrieve images which contain certain objects but not others. Despite advancements in vision-language models (VLMs) through large-scale training, their ability to comprehend negation remains underexplored. This study addresses the question: how well do current VLMs understand negation? We introduce NegBench, a new benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets. The benchmark consists of two core tasks designed to evaluate negation understanding in diverse multimodal settings: Retrieval with Negation and Multiple Choice Questions with Negated Captions. Our evaluation reveals that modern VLMs struggle significantly with negation, often performing at chance level. To address these shortcomings, we explore a data-centric approach wherein we finetune CLIP models on large-scale synthetic datasets containing millions of negated captions. We show that this approach can result in a 10% increase in recall on negated queries and a 40% boost in accuracy on multiple-choice questions with negated captions.
Related papers
- NeIn: Telling What You Don't Want [6.666707176043472]
Negation is a fundamental linguistic concept used by humans to convey information that they do not desire.
One barrier to achieving human-level intelligence is the lack of a standard collection by which research into negation can be evaluated.
This paper presents the first large-scale dataset, Negative Instruction (NeIn), for studying negation within the vision-language domain.
arXiv Detail & Related papers (2024-09-09T04:54:34Z) - How and where does CLIP process negation? [2.5600000778964294]
We build on the existence task from the VALSE benchmark to test models' understanding of negation.
We take inspiration from the literature on model interpretability to explain the behaviour of VL models on the understanding of negation.
arXiv Detail & Related papers (2024-07-15T07:20:06Z) - Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z) - CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
Negation [21.56001677478673]
We present the first English reading comprehension dataset which requires reasoning about the implications of negated statements in paragraphs.
CONDAQA features 14,182 question-answer pairs with over 200 unique negation cues.
The best performing model on CONDAQA (UnifiedQA-v2-3b) achieves only 42% on our consistency metric, well below human performance which is 81%.
arXiv Detail & Related papers (2022-11-01T06:10:26Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.