Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference
- URL: http://arxiv.org/abs/2508.14280v1
- Date: Tue, 19 Aug 2025 21:28:12 GMT
- Title: Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference
- Authors: Ali Rasekh, Sepehr Kazemi Ranjbar, Simon Gottschalk,
- Abstract summary: We introduce a multi-rationale explainable object recognition benchmark comprising datasets in which each image is annotated with multiple ground-truth rationales.<n>We propose a contrastive conditional inference framework that explicitly models the probabilistic relationships among image embeddings, category labels, and rationales.<n>Our approach achieves state-of-the-art results on the multi-rationale explainable object recognition benchmark, including strong zero-shot performance.
- Score: 1.2309843977641421
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explainable object recognition using vision-language models such as CLIP involves predicting accurate category labels supported by rationales that justify the decision-making process. Existing methods typically rely on prompt-based conditioning, which suffers from limitations in CLIP's text encoder and provides weak conditioning on explanatory structures. Additionally, prior datasets are often restricted to single, and frequently noisy, rationales that fail to capture the full diversity of discriminative image features. In this work, we introduce a multi-rationale explainable object recognition benchmark comprising datasets in which each image is annotated with multiple ground-truth rationales, along with evaluation metrics designed to offer a more comprehensive representation of the task. To overcome the limitations of previous approaches, we propose a contrastive conditional inference (CCI) framework that explicitly models the probabilistic relationships among image embeddings, category labels, and rationales. Without requiring any training, our framework enables more effective conditioning on rationales to predict accurate object categories. Our approach achieves state-of-the-art results on the multi-rationale explainable object recognition benchmark, including strong zero-shot performance, and sets a new standard for both classification accuracy and rationale quality. Together with the benchmark, this work provides a more complete framework for evaluating future models in explainable object recognition. The code will be made available online.
Related papers
- ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts [54.60525564599342]
ConceptScope is a scalable and automated framework for analyzing visual datasets.<n>It categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels.<n>It reliably detects known biases and uncovers previously unannotated ones.
arXiv Detail & Related papers (2025-10-30T06:46:17Z) - From Visual Explanations to Counterfactual Explanations with Latent Diffusion [11.433402357922414]
We propose a new approach to tackle two key challenges in recent prominent works.<n>First, we determine which specific counterfactual features are crucial for distinguishing the "concept" of the target class from the original class.<n>Second, we provide valuable explanations for the non-robust classifier without relying on the support of an adversarially robust model.
arXiv Detail & Related papers (2025-04-12T13:04:00Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification [5.8754760054410955]
We introduce textttHi-CoDecomposition, a novel framework designed to enhance model interpretability through structured concept analysis.
Our approach not only aligns with the performance of state-of-the-art models but also advances transparency by providing clear insights into the decision-making process.
arXiv Detail & Related papers (2024-05-29T00:36:56Z) - ECOR: Explainable CLIP for Object Recognition [4.385998292803586]
We propose a mathematical definition of explainability in the object recognition task based on the joint probability distribution of categories and rationales.
Our method demonstrates state-of-the-art performance in explainable classification.
This advancement improves explainable object recognition, enhancing trust across diverse applications.
arXiv Detail & Related papers (2024-04-19T12:20:49Z) - RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories.
This paper introduces a Retrieving And Ranking augmented method for MLLMs.
Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z) - Recursive Counterfactual Deconfounding for Object Recognition [20.128093193861165]
We propose a Recursive Counterfactual Deconfounding model for object recognition in both closed-set and open-set scenarios.
We show that the proposed RCD model performs better than 11 state-of-the-art baselines significantly in most cases.
arXiv Detail & Related papers (2023-09-25T07:46:41Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Recognition Awareness: An Application of Latent Cognizance to Open-Set
Recognition [0.0]
Softmax mechanism forces a model to predict an object class out of a set of pre-defined labels.
This characteristic contributes to efficacy in classification, but poses a risk of non-sense prediction in object recognition.
Open-Set Recognition is intended to address an issue of identifying a foreign object in object recognition.
arXiv Detail & Related papers (2021-08-27T04:41:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.