Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
- URL: http://arxiv.org/abs/2510.21704v1
- Date: Fri, 24 Oct 2025 17:59:02 GMT
- Title: Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
- Authors: Christy Li, Josep Lopez Camuñas, Jake Thomas Touchet, Jacob Andreas, Agata Lapedriza, Antonio Torralba, Tamar Rott Shaham,
- Abstract summary: We introduce an automated framework for detecting unintended reliance on visual features in vision models.<n>A self-reflective agent generates and tests hypotheses about visual attributes that a model may rely on.<n>We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies.
- Score: 58.90049897180927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robustness, preventing overfitting, and avoiding spurious correlations. We introduce an automated framework for detecting such dependencies in trained vision models. At the core of our method is a self-reflective agent that systematically generates and tests hypotheses about visual attributes that a model may rely on. This process is iterative: the agent refines its hypotheses based on experimental outcomes and uses a self-evaluation protocol to assess whether its findings accurately explain model behavior. When inconsistencies arise, the agent self-reflects over its findings and triggers a new cycle of experimentation. We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies across 18 categories. Our results show that the agent's performance consistently improves with self-reflection, with a significant performance increase over non-reflective baselines. We further demonstrate that the agent identifies real-world visual attribute dependencies in state-of-the-art models, including CLIP's vision encoder and the YOLOv8 object detector.
Related papers
- Learning to Pay Attention: Unsupervised Modeling of Attentive and Inattentive Respondents in Survey Data [0.14323566945483493]
Traditional safeguards, such as attention checks, are often costly, reactive, and inconsistent.<n>We propose a unified, label-free framework for inattentiveness detection using complementary unsupervised views.
arXiv Detail & Related papers (2026-03-02T22:11:51Z) - Feature-Aware Test Generation for Deep Learning Models [0.5368630420272898]
We introduce Detect, a feature-aware test generation framework for vision-based deep learning (DL) models.<n>It generates inputs by perturbing disentangled semantic attributes within the latent space.<n>It identifies which features lead to behavior shifts and uses a vision-language model for semantic attribution.
arXiv Detail & Related papers (2026-01-20T15:41:06Z) - Model Correlation Detection via Random Selection Probing [62.093777777813756]
Existing similarity-based methods require access to model parameters or produce scores without thresholds.<n>We introduce Random Selection Probing (RSP), a hypothesis-testing framework that formulates model correlation detection as a statistical test.<n>RSP produces rigorous p-values that quantify evidence of correlation.
arXiv Detail & Related papers (2025-09-29T01:40:26Z) - Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection [71.8243083897721]
Vision-language models often hallucinate details, generating non-existent objects or inaccurate attributes that compromise output reliability.<n>We present a novel framework that leverages the model's self-consistency between long responses and short answers to generate preference pairs for training.
arXiv Detail & Related papers (2025-09-27T10:37:11Z) - Vision Foundation Model Embedding-Based Semantic Anomaly Detection [12.940376547110509]
This work explores semantic anomaly detection by leveraging the semantic priors of state-of-the-art vision foundation models.<n>We propose a framework that compares local vision embeddings from runtime images to a database of nominal scenarios in which the autonomous system is deemed safe and performant.
arXiv Detail & Related papers (2025-05-12T19:00:29Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Prototype Generation: Robust Feature Visualisation for Data Independent
Interpretability [1.223779595809275]
Prototype Generation is a stricter and more robust form of feature visualisation for model-agnostic, data-independent interpretability of image classification models.
We demonstrate its ability to generate inputs that result in natural activation paths, countering previous claims that feature visualisation algorithms are untrustworthy due to the unnatural internal activations.
arXiv Detail & Related papers (2023-09-29T11:16:06Z) - Using Positive Matching Contrastive Loss with Facial Action Units to
mitigate bias in Facial Expression Recognition [6.015556590955814]
We propose to mitigate bias by guiding the model's focus towards task-relevant features using domain knowledge.
We show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.
arXiv Detail & Related papers (2023-03-08T21:28:02Z) - Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models.
We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy.
Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z) - AttriMeter: An Attribute-guided Metric Interpreter for Person
Re-Identification [100.3112429685558]
Person ReID systems only provide a distance or similarity when matching two persons.
We propose an Attribute-guided Metric Interpreter, named AttriMeter, to semantically and quantitatively explain the results of CNN-based ReID models.
arXiv Detail & Related papers (2021-03-02T03:37:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.