Color Names in Vision-Language Models
- URL: http://arxiv.org/abs/2509.22524v1
- Date: Fri, 26 Sep 2025 16:04:18 GMT
- Title: Color Names in Vision-Language Models
- Authors: Alexandra Gomez-Villa, Pablo Hernández-Cámara, Muhammad Atif Butt, Valero Laparra, Jesus Malo, Javier Vazquez-Corral,
- Abstract summary: We present the first systematic evaluation of color naming capabilities across vision-language models (VLMs)<n>Our results show that while VLMs achieve high accuracy on colors from classical studies, performance drops significantly on expanded, non-prototypical color sets.<n>We identify 21 common color terms that consistently emerge across all models, revealing two distinct approaches.
- Score: 48.847573209643265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Color serves as a fundamental dimension of human visual perception and a primary means of communicating about objects and scenes. As vision-language models (VLMs) become increasingly prevalent, understanding whether they name colors like humans is crucial for effective human-AI interaction. We present the first systematic evaluation of color naming capabilities across VLMs, replicating classic color naming methodologies using 957 color samples across five representative models. Our results show that while VLMs achieve high accuracy on prototypical colors from classical studies, performance drops significantly on expanded, non-prototypical color sets. We identify 21 common color terms that consistently emerge across all models, revealing two distinct approaches: constrained models using predominantly basic terms versus expansive models employing systematic lightness modifiers. Cross-linguistic analysis across nine languages demonstrates severe training imbalances favoring English and Chinese, with hue serving as the primary driver of color naming decisions. Finally, ablation studies reveal that language model architecture significantly influences color naming independent of visual processing capabilities.
Related papers
- ColorConceptBench: A Benchmark for Probabilistic Color-Concept Understanding in Text-to-Image Models [20.130253460357547]
We introduce ColorConceptBench, a new human-annotated benchmark to evaluate color-concept associations.<n>Our evaluation of seven leading text-to-image (T2I) models reveals that current models lack sensitivity to abstract semantics.<n>This demonstrates that achieving human-like color semantics requires more than larger models.
arXiv Detail & Related papers (2026-01-23T15:36:02Z) - COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation [0.0]
This paper introduces the Human Perception-Based Fuzzy Color Model, COLIBRI, to bridge the gap between computational color representations and human visual perception.<n>The proposed model uses fuzzy sets and logic to create a framework for color categorization.<n>Our findings are significant for fields such as design, artificial intelligence, marketing, and human-computer interaction.
arXiv Detail & Related papers (2025-07-15T17:01:45Z) - Color in Visual-Language Models: CLIP deficiencies [1.0159205678719043]
This work explores how color is encoded in CLIP (Contrastive Language-Image Pre-training) which is currently the most influential VML (Visual Language model) in Artificial Intelligence.<n>We come across two main deficiencies: (a) a clear bias on achromatic stimuli that are poorly related to the color concept, and (b) the tendency to prioritize text over other visual information.
arXiv Detail & Related papers (2025-02-06T19:38:12Z) - L-C4: Language-Based Video Colorization for Creative and Consistent Color [59.069498113050436]
We present Language-based video colorization for Creative and Consistent Colors (L-C4)
Our model is built upon a pre-trained cross-modality generative model.
We propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency.
arXiv Detail & Related papers (2024-10-07T12:16:21Z) - ColorFoil: Investigating Color Blindness in Large Vision and Language Models [0.0]
We introduce a novel V&L benchmark - ColorFoil.<n>We evaluate seven state-of-the-art V&L models including CLIP, ViLT, GroupViT, and BridgeTower.
arXiv Detail & Related papers (2024-05-19T22:04:57Z) - Generation Of Colors using Bidirectional Long Short Term Memory Networks [0.0]
Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades.
This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately.
arXiv Detail & Related papers (2023-11-11T11:35:37Z) - L-CAD: Language-based Colorization with Any-level Descriptions using
Diffusion Priors [62.80068955192816]
We propose a unified model to perform language-based colorization with any-level descriptions.
We leverage the pretrained cross-modality generative model for its robust language understanding and rich color priors.
With the proposed novel sampling strategy, our model achieves instance-aware colorization in diverse and complex scenarios.
arXiv Detail & Related papers (2023-05-24T14:57:42Z) - ColorSense: A Study on Color Vision in Machine Visual Recognition [57.916512479603064]
We collect 110,000 non-trivial human annotations of foreground and background color labels from visual recognition benchmarks.<n>We validate the use of our datasets by demonstrating that the level of color discrimination has a dominating effect on the performance of machine perception models.<n>Our findings suggest that object recognition tasks such as classification and localization are susceptible to color vision bias.
arXiv Detail & Related papers (2022-12-16T18:51:41Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes [91.24112204588353]
We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks.
In contrast to previous models, UViM has the same functional form for all tasks.
We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks.
arXiv Detail & Related papers (2022-05-20T17:47:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.