Benchmarking the Attribution Quality of Vision Models
- URL: http://arxiv.org/abs/2407.11910v2
- Date: Mon, 09 Dec 2024 11:39:08 GMT
- Title: Benchmarking the Attribution Quality of Vision Models
- Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth,
- Abstract summary: We propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol.
This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality.
We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work.
- Score: 13.255247017616687
- License:
- Abstract: Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how different design choices of popular vision backbones affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.
Related papers
- ProtoS-ViT: Visual foundation models for sparse self-explainable classifications [0.6249768559720122]
Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts.
This work first proposes an extensive set of quantitative and qualitative metrics which allow to identify drawbacks in current prototypical networks.
It then introduces a novel architecture which provides compact explanations, outperforming current prototypical models in terms of explanation quality.
arXiv Detail & Related papers (2024-06-14T13:36:30Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Learning Generalizable Perceptual Representations for Data-Efficient
No-Reference Image Quality Assessment [7.291687946822539]
A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations.
We enable the learning of low-level quality features to distortion types by introducing a novel quality-aware contrastive loss.
We design zero-shot quality predictions from both pathways in a completely blind setting.
arXiv Detail & Related papers (2023-12-08T05:24:21Z) - Adaptive Contextual Perception: How to Generalize to New Backgrounds and
Ambiguous Objects [75.15563723169234]
We investigate how vision models adaptively use context for out-of-distribution generalization.
We show that models that excel in one setting tend to struggle in the other.
To replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations.
arXiv Detail & Related papers (2023-06-09T15:29:54Z) - Towards Reliable Assessments of Demographic Disparities in Multi-Label
Image Classifiers [11.973749734226852]
We consider multi-label image classification and, specifically, object categorization tasks.
Design choices and trade-offs for measurement involve more nuance than discussed in prior computer vision literature.
We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments.
arXiv Detail & Related papers (2023-02-16T20:34:54Z) - Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion.
We benchmark the performance on this task when looking at model accuracy on the few-shot examples.
We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.