Fugu-MT 論文翻訳(概要): Color Names in Vision-Language Models

論文の概要: Color Names in Vision-Language Models

arxiv url: http://arxiv.org/abs/2509.22524v1
Date: Fri, 26 Sep 2025 16:04:18 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.570028
Title: Color Names in Vision-Language Models
Title（参考訳）: 視覚言語モデルにおける色名
Authors: Alexandra Gomez-Villa, Pablo Hernández-Cámara, Muhammad Atif Butt, Valero Laparra, Jesus Malo, Javier Vazquez-Corral,
Abstract要約: 視覚言語モデル(VLM)におけるカラー命名機能の最初の体系的評価について述べる。以上の結果から,VLMは古典的な研究から色に対して高い精度が得られる一方で,拡張された非原型カラーセットでは性能が著しく低下することが示唆された。我々は、すべてのモデルに一貫して現れる21の共通色項を特定し、2つの異なるアプローチを明らかにした。
参考スコア（独自算出の注目度）: 48.847573209643265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Color serves as a fundamental dimension of human visual perception and a primary means of communicating about objects and scenes. As vision-language models (VLMs) become increasingly prevalent, understanding whether they name colors like humans is crucial for effective human-AI interaction. We present the first systematic evaluation of color naming capabilities across VLMs, replicating classic color naming methodologies using 957 color samples across five representative models. Our results show that while VLMs achieve high accuracy on prototypical colors from classical studies, performance drops significantly on expanded, non-prototypical color sets. We identify 21 common color terms that consistently emerge across all models, revealing two distinct approaches: constrained models using predominantly basic terms versus expansive models employing systematic lightness modifiers. Cross-linguistic analysis across nine languages demonstrates severe training imbalances favoring English and Chinese, with hue serving as the primary driver of color naming decisions. Finally, ablation studies reveal that language model architecture significantly influences color naming independent of visual processing capabilities.
Abstract（参考訳）: 色は人間の視覚知覚の基本的な次元であり、物体やシーンについてコミュニケーションする主要な手段である。視覚言語モデル(VLM)がますます普及するにつれて、人間のような色の名前が効果的な人間とAIの相互作用に欠かせないかを理解することが重要である。本稿では,VLMにおけるカラー命名機能の最初の体系的評価を行い,従来のカラー命名手法を5つの代表モデルにまたがる957色サンプルを用いて再現する。以上の結果から,VLMは古典研究の原型色に対して高い精度が得られる一方で,拡張された非原型色集合では性能が著しく低下することが示唆された。我々は、すべてのモデルに一貫して現れる21の共通色項を同定し、2つの異なるアプローチを明らかにした: 主に基本的な用語を用いた制約付きモデルと、体系的な明度変調子を用いた拡張型モデルである。 9言語にわたる言語間の相互言語分析は、英語と中国語が好まれる厳しい訓練の不均衡を示し、色名決定の原動力となっている。最後に、アブレーション研究により、言語モデルアーキテクチャが視覚処理能力とは無関係に色命名に大きな影響を及ぼすことが明らかとなった。

論文の概要: Color Names in Vision-Language Models

関連論文リスト