InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation
based on Visual Illusion
- URL: http://arxiv.org/abs/2305.17716v4
- Date: Mon, 5 Jun 2023 22:52:57 GMT
- Title: InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation
based on Visual Illusion
- Authors: Haobo Yang, Wenyu Wang, Ze Cao, Zhekai Duan, Xuchen Liu
- Abstract summary: This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation.
We establish a unique dataset, InDL, designed to rigorously test and benchmark these models.
We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception.
- Score: 1.7980584146314789
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a novel approach to evaluating deep learning models'
capacity for in-diagram logic interpretation. Leveraging the intriguing realm
of visual illusions, we establish a unique dataset, InDL, designed to
rigorously test and benchmark these models. Deep learning has witnessed
remarkable progress in domains such as computer vision and natural language
processing. However, models often stumble in tasks requiring logical reasoning
due to their inherent 'black box' characteristics, which obscure the
decision-making process. Our work presents a new lens to understand these
models better by focusing on their handling of visual illusions -- a complex
interplay of perception and logic. We utilize six classic geometric optical
illusions to create a comparative framework between human and machine visual
perception. This methodology offers a quantifiable measure to rank models,
elucidating potential weaknesses and providing actionable insights for model
improvements. Our experimental results affirm the efficacy of our benchmarking
strategy, demonstrating its ability to effectively rank models based on their
logic interpretation ability. As part of our commitment to reproducible
research, the source code and datasets will be made publicly available at
https://github.com/rabbit-magic-wh/InDL
Related papers
- Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models [27.806966289284528]
We present a unified framework using sparse autoencoders (SAEs) to discover human-interpretable visual features.
We show that SAEs can reliably identify and manipulate interpretable visual features without model re-training.
arXiv Detail & Related papers (2025-02-10T18:32:41Z) - Language Model as Visual Explainer [72.88137795439407]
We present a systematic approach for interpreting vision models using a tree-structured linguistic explanation.
Our method provides human-understandable explanations in the form of attribute-laden trees.
To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations.
arXiv Detail & Related papers (2024-12-08T20:46:23Z) - SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels [16.020835290802548]
Slot-Attention for Object-centric Latent Dynamics is a novel model-based reinforcement learning algorithm.
It learns object-centric dynamics models in an unsupervised manner from pixel inputs.
We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over.
arXiv Detail & Related papers (2024-10-11T14:03:31Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system.
We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Human-Understandable Decision Making for Visual Recognition [30.30163407674527]
We propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process.
The effectiveness of our proposed model is evaluated on two classical visual recognition tasks.
arXiv Detail & Related papers (2021-03-05T02:07:33Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.