Related papers: Benchmarking the Attribution Quality of Vision Models

Benchmarking the Attribution Quality of Vision Models

URL: http://arxiv.org/abs/2407.11910v1
Date: Tue, 16 Jul 2024 17:02:20 GMT
Title: Benchmarking the Attribution Quality of Vision Models
Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth,
Abstract summary: We propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol. This allows us to evaluate 23 attribution methods and how eight different design choices of popular vision models affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work.
Score: 13.255247017616687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how eight different design choices of popular vision models affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.

Related papers

A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation. Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity. This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [84.16442052968615]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)<n>RISEBench focuses on four key reasoning categories: Temporal, Causal, Spatial, and Logical Reasoning.<n>We conduct experiments evaluating nine prominent visual editing models, comprising both open-source and proprietary models.
arXiv Detail & Related papers (2025-04-03T17:59:56Z)
Beyond Accuracy: What Matters in Designing Well-Behaved Models? [53.252827682118955]
We show that vision-language models exhibit high fairness on ImageNet-1k classification and strong robustness against domain changes. We conclude our study by introducing the QUBA score, a novel metric that ranks models across multiple dimensions of quality.
arXiv Detail & Related papers (2025-03-21T12:54:18Z)
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications [0.6249768559720122]
Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts. This work first proposes an extensive set of quantitative and qualitative metrics which allow to identify drawbacks in current prototypical networks. It then introduces a novel architecture which provides compact explanations, outperforming current prototypical models in terms of explanation quality.
arXiv Detail & Related papers (2024-06-14T13:36:30Z)
Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA) Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment [7.291687946822539]
A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations. We enable the learning of low-level quality features to distortion types by introducing a novel quality-aware contrastive loss. We design zero-shot quality predictions from both pathways in a completely blind setting.
arXiv Detail & Related papers (2023-12-08T05:24:21Z)
Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects [75.15563723169234]
We investigate how vision models adaptively use context for out-of-distribution generalization. We show that models that excel in one setting tend to struggle in the other. To replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations.
arXiv Detail & Related papers (2023-06-09T15:29:54Z)
Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers [11.973749734226852]
We consider multi-label image classification and, specifically, object categorization tasks. Design choices and trade-offs for measurement involve more nuance than discussed in prior computer vision literature. We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments.
arXiv Detail & Related papers (2023-02-16T20:34:54Z)
Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion. We benchmark the performance on this task when looking at model accuracy on the few-shot examples. We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z)
FUNQUE: Fusion of Unified Quality Evaluators [42.41484412777326]
Fusion-based quality assessment has emerged as a powerful method for developing high-performance quality models. We propose FUNQUE, a quality model that fuses unified quality evaluators.
arXiv Detail & Related papers (2022-02-23T00:21:43Z)
How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
Better Model Selection with a new Definition of Feature Importance [8.914907178577476]
Feature importance aims at measuring how crucial each input feature is for model prediction. In this paper, we propose a new tree-model explanation approach for model selection.
arXiv Detail & Related papers (2020-09-16T14:32:22Z)
Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models. As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.