Fugu-MT 論文翻訳(概要): When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

論文の概要: When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

arxiv url: http://arxiv.org/abs/2604.06422v1
Date: Tue, 07 Apr 2026 19:59:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 17:30:51.221659
Title: When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
Title（参考訳）: Apple Redを呼ぶとき:人間は内省のルールをフォローする、VLMはしない
Authors: Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman,
Abstract要約: 決定ルールを抽出し,これらのルールに対する従順性を評価するために,グレードドカラー属性データセットを導入する。モデルが自身の内省的ルールを体系的に違反していることが分かりました。以上の結果から,VLMの推論失敗は困難であり,VLMの内観的自己認識が誤校正されることが示唆された。
参考スコア（独自算出の注目度）: 48.4091438200409
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere to their introspective reasoning are central challenges for trustworthy deployment. To study this, we introduce the Graded Color Attribution (GCA) dataset, a controlled benchmark designed to elicit decision rules and evaluate participant faithfulness to these rules. GCA consists of line drawings that vary pixel-level color coverage across three conditions: world-knowledge recolorings, counterfactual recolorings, and shapes with no color priors. Using GCA, both VLMs and human participants establish a threshold: the minimum percentage of pixels of a given color an object must have to receive that color label. We then compare these rules with their subsequent color attribution decisions. Our findings reveal that models systematically violate their own introspective rules. For example, GPT-5-mini violates its stated introspection rules in nearly 60\% of cases on objects with strong color priors. Human participants remain faithful to their stated rules, with any apparent violations being explained by a well-documented tendency to overestimate color coverage. In contrast, we find that VLMs are excellent estimators of color coverage, yet blatantly contradict their own reasoning in their final responses. Across all models and strategies for eliciting introspective rules, world-knowledge priors systematically degrade faithfulness in ways that do not mirror human cognition. Our findings challenge the view that VLM reasoning failures are difficulty-driven and suggest that VLM introspective self-knowledge is miscalibrated, with direct implications for high-stakes deployment.
Abstract（参考訳）: VLM(Vision-Language Models)がいつ予期しない振る舞いをするのか、モデルが自身の振る舞いを確実に予測できるかどうか、そしてモデルがイントロスペクティブ推論に準拠しているかどうかを理解することは、信頼できるデプロイメントにおいて重要な課題である。そこで本研究では,決定ルールを抽出し,これらのルールに対する参加者の忠実度を評価するための制御ベンチマークであるGCAデータセットを提案する。 GCAは3つの条件にまたがってピクセルレベルのカラーカバレッジを変化させる線図で構成されており、ワールド・ナレッジ・リカラー化、カウンターファクト・リカラー化、カラー先行のない形状である。 GCAを使用すると、VLMとヒトの両方の参加者が閾値を確立する: 与えられた色のピクセルの最小パーセンテージは、そのカラーラベルを受けなければならない。次に、これらのルールとその後のカラー属性の決定を比較します。この結果,モデルが内省規則を体系的に違反していることが判明した。例えば、GPT-5-miniは、強い色の先行するオブジェクトの約60%のケースで、そのイントロスペクション規則に違反している。人間の参加者は、規定されたルールに忠実であり、明らかな違反は、色カバレッジを過大評価する、十分に文書化された傾向によって説明される。対照的に、VLMはカラーカバレッジの優れた推定器であるが、最終的な応答における独自の推論と矛盾する。内省的ルールを引き出すためのあらゆるモデルと戦略の中で、世界知識の優先事項は、人間の認知を反映しない方法で、体系的に忠実性を低下させる。以上の結果から,VLMの推論失敗は困難であり,VLMの内観的自己認識は誤認識であり,高いデプロイメントに直接的な意味があることが示唆された。

論文の概要: When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

関連論文リスト