What You See is What You Classify: Black Box Attributions
- URL: http://arxiv.org/abs/2205.11266v1
- Date: Mon, 23 May 2022 12:30:04 GMT
- Title: What You See is What You Classify: Black Box Attributions
- Authors: Steven Stalder, Nathana\"el Perraudin, Radhakrishna Achanta, Fernando
Perez-Cruz, Michele Volpi
- Abstract summary: We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum.
Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks.
We show that our attributions are superior to established methods both visually and quantitatively.
- Score: 61.998683569022006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An important step towards explaining deep image classifiers lies in the
identification of image regions that contribute to individual class scores in
the model's output. However, doing this accurately is a difficult task due to
the black-box nature of such networks. Most existing approaches find such
attributions either using activations and gradients or by repeatedly perturbing
the input. We instead address this challenge by training a second deep network,
the Explainer, to predict attributions for a pre-trained black-box classifier,
the Explanandum. These attributions are in the form of masks that only show the
classifier-relevant parts of an image, masking out the rest. Our approach
produces sharper and more boundary-precise masks when compared to the saliency
maps generated by other methods. Moreover, unlike most existing approaches,
ours is capable of directly generating very distinct class-specific masks.
Finally, the proposed method is very efficient for inference since it only
takes a single forward pass through the Explainer to generate all
class-specific masks. We show that our attributions are superior to established
methods both visually and quantitatively, by evaluating them on the PASCAL
VOC-2007 and Microsoft COCO-2014 datasets.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking.
We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model.
We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z) - Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training.
We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs)
We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z) - ContrastMask: Contrastive Learning to Segment Every Thing [18.265503138997794]
We propose ContrastMask, which learns a mask segmentation model on both seen and unseen categories.
Features from the mask regions (foreground) are pulled together, and are contrasted against those from the background, and vice versa.
Exhaustive experiments on the COCO dataset demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-18T07:41:48Z) - Few-shot semantic segmentation via mask aggregation [5.886986014593717]
Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data.
Previous works have typically regarded it as a pixel-wise classification problem.
We introduce a mask-based classification method for addressing this problem.
arXiv Detail & Related papers (2022-02-15T07:13:09Z) - Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks.
We propose MaskFormer, a simple mask classification model which predicts a set of binary masks.
Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z) - Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.
We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase.
Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z) - Investigating and Simplifying Masking-based Saliency Methods for Model
Interpretability [5.387323728379395]
Saliency maps that identify the most informative regions of an image are valuable for model interpretability.
A common approach to creating saliency maps involves generating input masks that mask out portions of an image.
We show that a masking model can be trained with as few as 10 examples per class and still generate saliency maps with only a 0.7-point increase in localization error.
arXiv Detail & Related papers (2020-10-19T18:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.