Fugu-MT 論文翻訳(概要): When Negation Is a Geometry Problem in Vision-Language Models

論文の概要: When Negation Is a Geometry Problem in Vision-Language Models

arxiv url: http://arxiv.org/abs/2603.20554v1
Date: Fri, 20 Mar 2026 23:06:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:38.966758
Title: When Negation Is a Geometry Problem in Vision-Language Models
Title（参考訳）: ネゲーションが視覚言語モデルにおける幾何学的問題である場合
Authors: Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis,
Abstract要約: CLIPのような統合ビジョン-言語埋め込みモデルは、通常、テキストクエリで否定を理解するのに失敗する。画像コンテンツに関する単純なイエス/ノー質問の理解に優れるマルチモーダルLLMs-as-a-judgeに基づく代替評価フレームワークについて検討する。
参考スコア（独自算出の注目度）: 32.51815690470519
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Joint Vision-Language Embedding models such as CLIP typically fail at understanding negation in text queries - for example, failing to distinguish "no" in the query: "a plain blue shirt with no logos". Prior work has largely addressed this limitation through data-centric approaches, fine-tuning CLIP on large-scale synthetic negation datasets. However, these efforts are commonly evaluated using retrieval-based metrics that cannot reliably reflect whether negation is actually understood. In this paper, we identify two key limitations of such evaluation metrics and investigate an alternative evaluation framework based on Multimodal LLMs-as-a-judge, which typically excel at understanding simple yes/no questions about image content, providing a fair evaluation of negation understanding in CLIP models. We then ask whether there already exists a direction in the CLIP embedding space associated with negation. We find evidence that such a direction exists, and show that it can be manipulated through test-time intervention via representation engineering to steer CLIP toward negation-aware behavior without any fine-tuning. Finally, we test negation understanding on non-common image-text samples to evaluate generalization under distribution shifts.
Abstract（参考訳）: CLIPのような統合ビジョン-言語埋め込みモデルは、通常、テキストクエリにおける否定を理解するのに失敗する。それまでの研究は、データ中心のアプローチ、大規模合成否定データセットの微調整CLIPを通じて、この制限に対処してきた。しかし、これらの取り組みは、否定が実際に理解されているかどうかを確実に反映できない検索ベースのメトリクスを用いて、一般的に評価されている。本稿では,これらの評価指標の2つの重要な限界を特定し,Multimodal LLMs-as-a-judgeに基づく代替評価フレームワークについて検討する。次に、否定に関連するCLIP埋め込み空間にすでに方向が存在するかどうかを尋ねる。このような方向が存在することを示す証拠が得られ,CLIPを微調整することなく,CLIPを制御するための表現工学によるテスト時間介入によって操作できることが示される。最後に,非一般的な画像テキストサンプルに対する否定的理解を検証し,分布シフト下での一般化を評価する。

関連論文リスト

What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging [42.41372222021938]
最先端のビジョン言語モデル(VLM)は否定を理解する上で重大な失敗を経験し、しばしば肯定バイアスと呼ばれる。 CoVANDは,システムチェーン(CoT)とVQAベースのパイプラインで構築されたデータセットで,高品質なインスタンス基底型否定データを生成する。第二に,肯定バイアスのアーキテクチャ的原因に直接対処する新しいテキストトークンマージモジュールNegToMeを提案する。
論文参考訳（メタデータ） (2025-10-15T07:36:38Z)
Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding [4.9301587184653295]
否定は、大規模な言語モデルに永続的な課題をもたらす基本的な言語現象である。既存のベンチマークは、自然言語推論のような幅広いタスクにおいて、否定をサイドケースとして扱うことが多い。本稿では,LLMにおける文レベルの否定的理解を評価するために設計された新しいベンチマークであるThunder-NUBenchを紹介する。
論文参考訳（メタデータ） (2025-06-17T10:51:39Z)
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP [57.33324843049638]
本稿では,大言語モデル(LLM)と多モーダルLLMを用いたデータ生成パイプラインを導入し,否定を包含するキャプションを生成する。パイプラインから生成したデータを微調整したCLIPを用いて,一般性を維持しつつ否定意識を高めるNegationCLIPを開発した。さまざまなCLIPアーキテクチャの実験は、CLIPの否定を正確に認識する能力を向上する上で、データ生成パイプラインの有効性を検証する。
論文参考訳（メタデータ） (2025-01-19T01:17:05Z)
Vision-Language Models Do Not Understand Negation [50.27667000027403]
NegBenchは18のタスクバリエーションと79ドルのサンプルに対する否定的理解を評価するために設計されたベンチマークである。提案手法は, 否定的クエリに対するリコールが10%増加し, 否定的キャプションを用いた複数質問に対する精度が28%向上することを示す。
論文参考訳（メタデータ） (2025-01-16T09:55:42Z)
Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
現代英語大言語モデル(LLM)に対する接尾辞否定の影響を計測する。我々は、異なるサブワードトークン化手法を用いてLLMを用いて実験を行う。モデルは全体として、接尾辞の意味を確実に認識できることを示す。
論文参考訳（メタデータ） (2024-04-03T03:14:27Z)
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation [59.307534363825816]
否定は現在の言語モデルでは不十分だが、この問題の範囲は広く理解されていない。自然言語推論(NLI)テストスイートを導入し,NLP手法の能力を検証した。
論文参考訳（メタデータ） (2022-10-06T23:39:01Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。