Grounding Visual Illusions in Language: Do Vision-Language Models
Perceive Illusions Like Humans?
- URL: http://arxiv.org/abs/2311.00047v1
- Date: Tue, 31 Oct 2023 18:01:11 GMT
- Title: Grounding Visual Illusions in Language: Do Vision-Language Models
Perceive Illusions Like Humans?
- Authors: Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai
- Abstract summary: Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world.
Do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality?
We build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs.
- Score: 28.654771227396807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language Models (VLMs) are trained on vast amounts of data captured by
humans emulating our understanding of the world. However, known as visual
illusions, human's perception of reality isn't always faithful to the physical
world. This raises a key question: do VLMs have the similar kind of illusions
as humans do, or do they faithfully learn to represent reality? To investigate
this question, we build a dataset containing five types of visual illusions and
formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our
findings have shown that although the overall alignment is low, larger models
are closer to human perception and more susceptible to visual illusions. Our
dataset and initial findings will promote a better understanding of visual
illusions in humans and machines and provide a stepping stone for future
computational models that can better align humans and machines in perceiving
and communicating about the shared visual world. The code and data are
available at https://github.com/vl-illusion/dataset.
Related papers
- IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models [56.34742191010987]
Current Visual Language Models (VLMs) show impressive image understanding but struggle with visual illusions.
We introduce IllusionBench, a comprehensive visual illusion dataset that encompasses classic cognitive illusions and real-world scene illusions.
We design trap illusions that resemble classical patterns but differ in reality, highlighting issues in SOTA models.
arXiv Detail & Related papers (2025-01-01T14:10:25Z) - The Art of Deception: Color Visual Illusions and Diffusion Models [55.830105086695]
Recent studies have shown that artificial neural networks (ANNs) can also be deceived by visual illusions.
We show how visual illusions are encoded in diffusion models.
We also show how to generate new unseen visual illusions in realistic images using text-to-image diffusion models.
arXiv Detail & Related papers (2024-12-13T13:07:08Z) - Evaluating Model Perception of Color Illusions in Photorealistic Scenes [16.421832484760987]
We study the perception of color illusions by vision-language models.
We propose an automated framework for generating color illusion images.
Experiments show that all studied VLMs exhibit perceptual biases similar human vision.
arXiv Detail & Related papers (2024-12-09T03:49:10Z) - When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - BRI3L: A Brightness Illusion Image Dataset for Identification and
Localization of Regions of Illusory Perception [4.685953126232505]
We develop a dataset of visual illusions and benchmark using data-driven approach for illusion classification and localization.
We consider five types of brightness illusions: 1) Hermann grid, 2) Simultaneous Contrast, 3) White illusion, 4) Grid illusion, and 5) Induced Grating illusion.
The application of deep learning model, it is shown, also generalizes over unseen brightness illusions like brightness assimilation to contrast transitions.
arXiv Detail & Related papers (2024-02-07T02:57:40Z) - Improving generalization by mimicking the human visual diet [34.32585612888424]
We present a new perspective on bridging the generalization gap between biological and computer vision.
Our results demonstrate that incorporating variations and contextual cues ubiquitous in the human visual training data (visual diet) significantly improves generalization to real-world transformations.
arXiv Detail & Related papers (2022-06-15T20:32:24Z) - Can machines learn to see without visual databases? [93.73109506642112]
This paper focuses on developing machines that learn to see without needing to handle visual databases.
This might open the doors to a truly competitive track concerning deep learning technologies for vision.
arXiv Detail & Related papers (2021-10-12T13:03:54Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.