Related papers: Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI

Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI

URL: http://arxiv.org/abs/2505.21589v1
Date: Tue, 27 May 2025 12:22:59 GMT
Title: Do you see what I see? An Ambiguous Optical Illusion Dataset exposing limitations of Explainable AI
Authors: Carina Newen, Luca Hinkamp, Maria Ntonti, Emmanuel Müller,
Abstract summary: We introduce a novel dataset of optical illusions featuring intermingled animal pairs designed to evoke perceptual ambiguity.<n>We identify generalizable visual concepts, particularly gaze direction and eye cues, as subtle yet impactful features that significantly influence model accuracy.<n>Our findings underscore the importance of concepts in visual learning and provide a foundation for studying bias and alignment between human and machine vision.
Score: 4.58733012283457
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: From uncertainty quantification to real-world object detection, we recognize the importance of machine learning algorithms, particularly in safety-critical domains such as autonomous driving or medical diagnostics. In machine learning, ambiguous data plays an important role in various machine learning domains. Optical illusions present a compelling area of study in this context, as they offer insight into the limitations of both human and machine perception. Despite this relevance, optical illusion datasets remain scarce. In this work, we introduce a novel dataset of optical illusions featuring intermingled animal pairs designed to evoke perceptual ambiguity. We identify generalizable visual concepts, particularly gaze direction and eye cues, as subtle yet impactful features that significantly influence model accuracy. By confronting models with perceptual ambiguity, our findings underscore the importance of concepts in visual learning and provide a foundation for studying bias and alignment between human and machine vision. To make this dataset useful for general purposes, we generate optical illusions systematically with different concepts discussed in our bias mitigation section. The dataset is accessible in Kaggle via https://kaggle.com/datasets/693bf7c6dd2cb45c8a863f9177350c8f9849a9508e9d50526e2ffcc5559a8333. Our source code can be found at https://github.com/KDD-OpenSource/Ambivision.git.

Related papers

Do Large Vision-Language Models Distinguish between the Actual and Apparent Features of Illusions? [12.157632635072435]
Humans are susceptible to optical illusions, which serve as valuable tools for investigating sensory and cognitive processes.<n>Research has begun exploring whether machines, such as large vision language models (LVLMs), exhibit similar susceptibilities to visual illusions.
arXiv Detail & Related papers (2025-06-06T05:47:50Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision [0.08192907805418582]
We show that in natural visual scenes the frequency of appearance of different numerosities follows a power law distribution. We show that the correlational structure for numerosity and continuous magnitudes is stable across datasets and scene types.
arXiv Detail & Related papers (2024-09-17T09:49:29Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z)
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture. We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z)
URLOST: Unsupervised Representation Learning without Stationarity or Topology [26.010647961403148]
We introduce a novel framework that learns from high-dimensional data without prior knowledge of stationarity and topology.<n>Our model, abbreviated as URLOST, combines a learnable self-organizing layer, spectral clustering, and a masked autoencoder.<n>We evaluate its effectiveness on three diverse data modalities including simulated biological vision data, neural recordings from the primary visual cortex, and gene expressions.
arXiv Detail & Related papers (2023-10-06T18:00:02Z)
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion [1.7980584146314789]
This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. We establish a unique dataset, InDL, designed to rigorously test and benchmark these models. We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception.
arXiv Detail & Related papers (2023-05-28T13:01:32Z)
ColorSense: A Study on Color Vision in Machine Visual Recognition [57.916512479603064]
We collect 110,000 non-trivial human annotations of foreground and background color labels from visual recognition benchmarks.<n>We validate the use of our datasets by demonstrating that the level of color discrimination has a dominating effect on the performance of machine perception models.<n>Our findings suggest that object recognition tasks such as classification and localization are susceptible to color vision bias.
arXiv Detail & Related papers (2022-12-16T18:51:41Z)
Multimodal perception for dexterous manipulation [14.314776558032166]
We propose a cross-modal sensory data generation framework for the translation between vision and touch. We propose a-temporal attention model for tactile texture recognition, which takes both spatial features and time dimension into consideration.
arXiv Detail & Related papers (2021-12-28T21:20:26Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks. We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.