ATCON: Attention Consistency for Vision Models
- URL: http://arxiv.org/abs/2210.09705v1
- Date: Tue, 18 Oct 2022 09:30:20 GMT
- Title: ATCON: Attention Consistency for Vision Models
- Authors: Ali Mirzazadeh, Florian Dubost, Maxwell Pike, Krish Maniar, Max Zuo,
Christopher Lee-Messer, Daniel Rubin
- Abstract summary: We propose an unsupervised fine-tuning method that improves the consistency of attention maps.
We show results on Grad-CAM and Integrated Gradients in an ablation study.
Those improved attention maps may help clinicians better understand vision model predictions.
- Score: 0.8312466807725921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention--or attribution--maps methods are methods designed to highlight
regions of the model's input that were discriminative for its predictions.
However, different attention maps methods can highlight different regions of
the input, with sometimes contradictory explanations for a prediction. This
effect is exacerbated when the training set is small. This indicates that
either the model learned incorrect representations or that the attention maps
methods did not accurately estimate the model's representations. We propose an
unsupervised fine-tuning method that optimizes the consistency of attention
maps and show that it improves both classification performance and the quality
of attention maps. We propose an implementation for two state-of-the-art
attention computation methods, Grad-CAM and Guided Backpropagation, which
relies on an input masking technique. We also show results on Grad-CAM and
Integrated Gradients in an ablation study. We evaluate this method on our own
dataset of event detection in continuous video recordings of hospital patients
aggregated and curated for this work. As a sanity check, we also evaluate the
proposed method on PASCAL VOC and SVHN. With the proposed method, with small
training sets, we achieve a 6.6 points lift of F1 score over the baselines on
our video dataset, a 2.9 point lift of F1 score on PASCAL, and a 1.8 points
lift of mean Intersection over Union over Grad-CAM for weakly supervised
detection on PASCAL. Those improved attention maps may help clinicians better
understand vision model predictions and ease the deployment of machine learning
systems into clinical care. We share part of the code for this article at the
following repository: https://github.com/alimirzazadeh/SemisupervisedAttention.
Related papers
- Improving Online Lane Graph Extraction by Object-Lane Clustering [106.71926896061686]
We propose an architecture and loss formulation to improve the accuracy of local lane graph estimates.
The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers.
We show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods.
arXiv Detail & Related papers (2023-07-20T15:21:28Z) - HomE: Homography-Equivariant Video Representation Learning [62.89516761473129]
We propose a novel method for representation learning of multi-view videos.
Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views.
On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101 dataset, better than most state-of-the-art self-supervised learning methods.
arXiv Detail & Related papers (2023-06-02T15:37:43Z) - Fine-Grained Visual Classification using Self Assessment Classifier [12.596520707449027]
Extracting discriminative features plays a crucial role in the fine-grained visual classification task.
In this paper, we introduce a Self Assessment, which simultaneously leverages the representation of the image and top-k prediction classes.
We show that our method achieves new state-of-the-art results on CUB200-2011, Stanford Dog, and FGVC Aircraft datasets.
arXiv Detail & Related papers (2022-05-21T07:41:27Z) - Object Class Aware Video Anomaly Detection through Image Translation [1.2944868613449219]
This paper proposes a novel two-stream object-aware VAD method that learns the normal appearance and motion patterns through image translation tasks.
The results show that, as significant improvements to previous methods, detections by our method are completely explainable and anomalies are localized accurately in the frames.
arXiv Detail & Related papers (2022-05-03T18:04:27Z) - Metrics for saliency map evaluation of deep learning explanation methods [0.0]
We critically analyze the Deletion Area Under Curve (DAUC) and Insertion Area Under Curve (IAUC) metrics proposed by Petsiuk et al.
These metrics were designed to evaluate the faithfulness of saliency maps generated by generic methods such as Grad-CAM or RISE.
We show that the actual saliency score values given by the saliency map are ignored as only the ranking of the scores is taken into account.
arXiv Detail & Related papers (2022-01-31T14:59:36Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Keep CALM and Improve Visual Feature Attribution [42.784665606132]
The class activation mapping, or CAM, has been the cornerstone of feature attribution methods for multiple vision tasks.
We improve CAM by explicitly incorporating a latent variable encoding the location of the cue for recognition in the formulation.
The resulting model, class activation latent mapping, or CALM, is trained with the expectation-maximization algorithm.
arXiv Detail & Related papers (2021-06-15T03:33:25Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Articulation-aware Canonical Surface Mapping [54.0990446915042]
We tackle the tasks of predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and inferring the articulation and pose of the template corresponding to the input image.
Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions.
We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.
arXiv Detail & Related papers (2020-04-01T17:56:45Z) - Uncertainty based Class Activation Maps for Visual Question Answering [30.859101872119517]
We propose a method that obtains gradient-based certainty estimates that also provide visual attention maps.
We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates.
The proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models.
arXiv Detail & Related papers (2020-01-23T19:54:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.