The Illusion-Illusion: Vision Language Models See Illusions Where There are None
- URL: http://arxiv.org/abs/2412.18613v1
- Date: Sat, 07 Dec 2024 03:30:51 GMT
- Title: The Illusion-Illusion: Vision Language Models See Illusions Where There are None
- Authors: Tomer Ullman,
- Abstract summary: I show that many current vision language systems mistakenly see illusory-illusions as illusions.
I suggest that such failures are part of broader failures already discussed in the literature.
- Score: 0.0
- License:
- Abstract: Illusions are entertaining, but they are also a useful diagnostic tool in cognitive science, philosophy, and neuroscience. A typical illusion shows a gap between how something "really is" and how something "appears to be", and this gap helps us understand the mental processing that lead to how something appears to be. Illusions are also useful for investigating artificial systems, and much research has examined whether computational models of perceptions fall prey to the same illusions as people. Here, I invert the standard use of perceptual illusions to examine basic processing errors in current vision language models. I present these models with illusory-illusions, neighbors of common illusions that should not elicit processing errors. These include such things as perfectly reasonable ducks, crooked lines that truly are crooked, circles that seem to have different sizes because they are, in fact, of different sizes, and so on. I show that many current vision language systems mistakenly see these illusion-illusions as illusions. I suggest that such failures are part of broader failures already discussed in the literature.
Related papers
- IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models [56.34742191010987]
Current Visual Language Models (VLMs) show impressive image understanding but struggle with visual illusions.
We introduce IllusionBench, a comprehensive visual illusion dataset that encompasses classic cognitive illusions and real-world scene illusions.
We design trap illusions that resemble classical patterns but differ in reality, highlighting issues in SOTA models.
arXiv Detail & Related papers (2025-01-01T14:10:25Z) - Slow Perception: Let's Perceive Geometric Figures Step-by-step [53.69067976062474]
We believe accurate copying (strong perception) is the first step to visual o1.
We introduce the concept of "slow perception" (SP), which guides the model to gradually perceive basic point-line combinations.
arXiv Detail & Related papers (2024-12-30T00:40:35Z) - The Art of Deception: Color Visual Illusions and Diffusion Models [55.830105086695]
Recent studies have shown that artificial neural networks (ANNs) can also be deceived by visual illusions.
We show how visual illusions are encoded in diffusion models.
We also show how to generate new unseen visual illusions in realistic images using text-to-image diffusion models.
arXiv Detail & Related papers (2024-12-13T13:07:08Z) - A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments [59.87080148922358]
We argue that Makelov et al. (2023) see in practice are artifacts of their training and evaluation paradigms.
Though we disagree with their core characterization, Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the field of interpretability forward.
arXiv Detail & Related papers (2024-01-23T10:27:42Z) - Diffusion Illusions: Hiding Images in Plain Sight [37.87050866208039]
Diffusion Illusions is the first comprehensive pipeline designed to automatically generate a wide range of illusions.
We study three types of illusions, each where the prime images are arranged in different ways.
We conduct comprehensive experiments on these illusions and verify the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-12-06T18:59:18Z) - Grounding Visual Illusions in Language: Do Vision-Language Models
Perceive Illusions Like Humans? [28.654771227396807]
Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world.
Do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality?
We build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs.
arXiv Detail & Related papers (2023-10-31T18:01:11Z) - HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models [69.52245481329899]
We introduce HallusionBench, a benchmark for the evaluation of image-context reasoning.
The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts.
In our evaluation on HallusionBench, we benchmarked 15 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V.
arXiv Detail & Related papers (2023-10-23T04:49:09Z) - Evolutionary Generation of Visual Motion Illusions [0.0]
We present a generative model, the Evolutionary Illusion GENerator (EIGen), that creates new visual motion illusions.
The structure of EIGen supports the hypothesis that illusory motion might be the result of perceiving the brain's own predictions.
The scientific motivation of this paper is to demonstrate that the perception of illusory motion could be a side effect of the predictive abilities of the brain.
arXiv Detail & Related papers (2021-12-25T14:53:50Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.