Doubly Right Object Recognition: A Why Prompt for Visual Rationales
- URL: http://arxiv.org/abs/2212.06202v2
- Date: Wed, 22 Mar 2023 21:05:49 GMT
- Title: Doubly Right Object Recognition: A Why Prompt for Visual Rationales
- Authors: Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng
Yang, Xin Wang, Carl Vondrick
- Abstract summary: We investigate whether computer vision models can also provide correct rationales for their predictions.
We propose a doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.
- Score: 28.408764714247837
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many visual recognition models are evaluated only on their classification
accuracy, a metric for which they obtain strong performance. In this paper, we
investigate whether computer vision models can also provide correct rationales
for their predictions. We propose a ``doubly right'' object recognition
benchmark, where the metric requires the model to simultaneously produce both
the right labels as well as the right rationales. We find that state-of-the-art
visual models, such as CLIP, often provide incorrect rationales for their
categorical predictions. However, by transferring the rationales from language
models into visual representations through a tailored dataset, we show that we
can learn a ``why prompt,'' which adapts large visual representations to
produce correct rationales. Visualizations and empirical experiments show that
our prompts significantly improve performance on doubly right object
recognition, in addition to zero-shot transfer to unseen tasks and datasets.
Related papers
- RealCQA-V2 : Visual Premise Proving [2.9201864249313383]
We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering.
This approach represents a departure from conventional accuracy-based evaluation methods.
A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
arXiv Detail & Related papers (2024-10-29T19:32:53Z) - Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness [20.269583912221734]
We introduce a novel evaluation framework to assess the graphical perception of image embedding models.
For chart comprehension, we examine two main aspects of channel effectiveness: accuracy and discriminability of various visual channels.
Experiments with the CLIP model show that it perceives channel accuracy differently from humans and shows unique discriminability in channels like length, tilt, and curvature.
arXiv Detail & Related papers (2024-07-30T14:22:13Z) - ECOR: Explainable CLIP for Object Recognition [4.385998292803586]
We propose a mathematical definition of explainability in the object recognition task based on the joint probability distribution of categories and rationales.
Our method demonstrates state-of-the-art performance in explainable classification.
This advancement improves explainable object recognition, enhancing trust across diverse applications.
arXiv Detail & Related papers (2024-04-19T12:20:49Z) - Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Recursive Counterfactual Deconfounding for Object Recognition [20.128093193861165]
We propose a Recursive Counterfactual Deconfounding model for object recognition in both closed-set and open-set scenarios.
We show that the proposed RCD model performs better than 11 state-of-the-art baselines significantly in most cases.
arXiv Detail & Related papers (2023-09-25T07:46:41Z) - See, Think, Confirm: Interactive Prompting Between Vision and Language
Models for Knowledge-based Visual Reasoning [60.43585179885355]
We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge-based visual reasoning.
IPVR contains three stages, see, think and confirm.
We conduct experiments on a range of knowledge-based visual reasoning datasets.
arXiv Detail & Related papers (2023-01-12T18:59:50Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Desiderata for Representation Learning: A Causal Perspective [104.3711759578494]
We take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning)
This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets.
arXiv Detail & Related papers (2021-09-08T17:33:54Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - SHOP-VRB: A Visual Reasoning Benchmark for Object Perception [26.422761228628698]
We present an approach and a benchmark for visual reasoning in robotics applications.
We focus on inferring object properties from visual and text data.
We propose a reasoning system based on symbolic program execution.
arXiv Detail & Related papers (2020-04-06T13:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.