Color in Visual-Language Models: CLIP deficiencies
- URL: http://arxiv.org/abs/2502.04470v1
- Date: Thu, 06 Feb 2025 19:38:12 GMT
- Title: Color in Visual-Language Models: CLIP deficiencies
- Authors: Guillem Arias, Ramon Baldrich, Maria Vanrell,
- Abstract summary: This work explores how color is encoded in CLIP (Contrastive Language-Image Pre-training) which is currently the most influential VML (Visual Language model) in Artificial Intelligence.
We come across two main deficiencies: (a) a clear bias on achromatic stimuli that are poorly related to the color concept, and (b) the tendency to prioritize text over other visual information.
- Score: 1.0159205678719043
- License:
- Abstract: This work explores how color is encoded in CLIP (Contrastive Language-Image Pre-training) which is currently the most influential VML (Visual Language model) in Artificial Intelligence. After performing different experiments on synthetic datasets created for this task, we conclude that CLIP is able to attribute correct color labels to colored visual stimulus, but, we come across two main deficiencies: (a) a clear bias on achromatic stimuli that are poorly related to the color concept, thus white, gray and black are rarely assigned as color labels; and (b) the tendency to prioritize text over other visual information. Here we prove it is highly significant in color labelling through an exhaustive Stroop-effect test. With the aim to find the causes of these color deficiencies, we analyse the internal representation at the neuron level. We conclude that CLIP presents an important amount of neurons selective to text, specially in deepest layers of the network, and a smaller amount of multi-modal color neurons which could be the key of understanding the concept of color properly. Our investigation underscores the necessity of refining color representation mechanisms in neural networks to foster a more comprehensive comprehension of colors as humans understand them, thereby advancing the efficacy and versatility of multimodal models like CLIP in real-world scenarios.
Related papers
- Primary visual cortex contributes to color constancy by predicting rather than discounting the illuminant: evidence from a computational study [15.2781669109191]
We build an electrophysiologically based V1 neural model to learn the color of the light source from a natural image dataset.
We find that both the spatial structures and color weights of the receptive fields of the learned model neurons are quite similar to those of the simple and DO neurons recorded in V1.
arXiv Detail & Related papers (2024-12-10T01:42:49Z) - Divergences in Color Perception between Deep Neural Networks and Humans [3.0315685825606633]
We develop experiments for evaluating the perceptual coherence of color embeddings in deep neural networks (DNNs)
We assess how well these algorithms predict human color similarity judgments collected via an online survey.
We compare DNN performance against an interpretable and cognitively plausible model of color perception based on wavelet decomposition.
arXiv Detail & Related papers (2023-09-11T20:26:40Z) - Name Your Colour For the Task: Artificially Discover Colour Naming via
Colour Quantisation Transformer [62.75343115345667]
We propose a novel colour quantisation transformer, CQFormer, that quantises colour space while maintaining machine recognition on the quantised images.
We observe the consistent evolution pattern between our artificial colour system and basic colour terms across human languages.
Our colour quantisation method also offers an efficient quantisation method that effectively compresses the image storage.
arXiv Detail & Related papers (2022-12-07T03:39:18Z) - Exploration of the Usage of Color Terms by Color-blind Participants in
Online Discussion Platforms [4.445130093341008]
We show that red-green color-blind speakers use the "red" and "green" color terms in less predictable contexts.
These findings shed some new and interesting light on the role of sensory experience on our linguistic system.
arXiv Detail & Related papers (2022-10-21T12:11:10Z) - Learning to Structure an Image with Few Colors and Beyond [59.34619548026885]
We propose a color quantization network, ColorCNN, which learns to structure an image in limited color spaces by minimizing the classification loss.
We introduce ColorCNN+, which supports multiple color space size configurations, and addresses the previous issues of poor recognition accuracy and undesirable visual fidelity under large color spaces.
For potential applications, we show that ColorCNNs can be used as image compression methods for network recognition.
arXiv Detail & Related papers (2022-08-17T17:59:15Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Influence of Color Spaces for Deep Learning Image Colorization [2.3705923859070217]
Existing colorization methods rely on different color spaces: RGB, YUV, Lab, etc.
In this chapter, we aim to study their influence on the results obtained by training a deep neural network.
We compare the results obtained with the same deep neural network architecture with RGB, YUV and Lab color spaces.
arXiv Detail & Related papers (2022-04-06T14:14:07Z) - Assessing The Importance Of Colours For CNNs In Object Recognition [70.70151719764021]
Convolutional neural networks (CNNs) have been shown to exhibit conflicting properties.
We demonstrate that CNNs often rely heavily on colour information while making a prediction.
We evaluate a model trained with congruent images on congruent, greyscale, and incongruent images.
arXiv Detail & Related papers (2020-12-12T22:55:06Z) - Is It a Plausible Colour? UCapsNet for Image Colourisation [38.88087332284959]
We introduce a novel architecture for colourisation of grayscale images.
The architecture is based on Capsules trained following the adversarial learning paradigm.
We show that our approach is able to generate more vibrant and plausible colours than exiting solutions.
arXiv Detail & Related papers (2020-12-04T09:07:13Z) - Semantic-driven Colorization [78.88814849391352]
Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images.
In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it.
arXiv Detail & Related papers (2020-06-13T08:13:30Z) - Learning to Structure an Image with Few Colors [59.34619548026885]
We propose a color quantization network, ColorCNN, which learns to structure the images from the classification loss in an end-to-end manner.
With only a 1-bit color space (i.e., two colors), the proposed network achieves 82.1% top-1 accuracy on the CIFAR10 dataset.
For applications, when encoded with PNG, the proposed color quantization shows superiority over other image compression methods in the extremely low bit-rate regime.
arXiv Detail & Related papers (2020-03-17T17:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.