Related papers: What You See is What You Classify: Black Box Attributions

What You See is What You Classify: Black Box Attributions

URL: http://arxiv.org/abs/2205.11266v1
Date: Mon, 23 May 2022 12:30:04 GMT
Title: What You See is What You Classify: Black Box Attributions
Authors: Steven Stalder, Nathana\"el Perraudin, Radhakrishna Achanta, Fernando Perez-Cruz, Michele Volpi
Abstract summary: We train a deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. Unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively.
Score: 61.998683569022006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundary-precise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks. Finally, the proposed method is very efficient for inference since it only takes a single forward pass through the Explainer to generate all class-specific masks. We show that our attributions are superior to established methods both visually and quantitatively, by evaluating them on the PASCAL VOC-2007 and Microsoft COCO-2014 datasets.

Related papers

SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models [6.0870128457015715]
We show that cross-attention alone provides very coarse object localization, which however can provide initial seeds.<n>We also observe that a simple-text-guided synthetic image often has a uniform background, which is easier to find correspondences.<n>Our proposed method, dubbed SeeDiff, generates high-quality masks off-the-shelf from Stable Diffusion.
arXiv Detail & Related papers (2025-07-26T05:44:00Z)
Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation [5.824064631226058]
We propose an end-to-end method that directly utilizes the attention maps learned by a Transformer Vision (ViT) for Weakly Supervised Semantics (WSSS)<n>At inference time, we aggregate the different self-attention maps of each [] token corresponding to the predicted labels to generate pseudo segmentation masks.
arXiv Detail & Related papers (2025-07-09T13:53:34Z)
Evolved Hierarchical Masking for Self-Supervised Learning [49.77271430882176]
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning.
arXiv Detail & Related papers (2025-04-12T09:40:14Z)
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework. We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
Towards Improved Input Masking for Convolutional Neural Networks [66.99060157800403]
We propose a new masking method for CNNs we call layer masking. We show that our method is able to eliminate or minimize the influence of the mask shape or color on the output of the model. We also demonstrate how the shape of the mask may leak information about the class, thus affecting estimates of model reliance on class-relevant features.
arXiv Detail & Related papers (2022-11-26T19:31:49Z)
Exploiting Shape Cues for Weakly Supervised Semantic Segmentation [15.791415215216029]
Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wise class predictions with only image-level labels for training. We propose to exploit shape information to supplement the texture-biased property of convolutional neural networks (CNNs) We further refine the predictions in an online fashion with a novel refinement method that takes into account both the class and the color affinities.
arXiv Detail & Related papers (2022-08-08T17:25:31Z)
ContrastMask: Contrastive Learning to Segment Every Thing [18.265503138997794]
We propose ContrastMask, which learns a mask segmentation model on both seen and unseen categories. Features from the mask regions (foreground) are pulled together, and are contrasted against those from the background, and vice versa. Exhaustive experiments on the COCO dataset demonstrate the superiority of our method.
arXiv Detail & Related papers (2022-03-18T07:41:48Z)
Few-shot semantic segmentation via mask aggregation [5.886986014593717]
Few-shot semantic segmentation aims to recognize novel classes with only very few labelled data. Previous works have typically regarded it as a pixel-wise classification problem. We introduce a mask-based classification method for addressing this problem.
arXiv Detail & Related papers (2022-02-15T07:13:09Z)
Per-Pixel Classification is Not All You Need for Semantic Segmentation [184.2905747595058]
Mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks. We propose MaskFormer, a simple mask classification model which predicts a set of binary masks. Our method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
arXiv Detail & Related papers (2021-07-13T17:59:50Z)
Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness [66.55719330810547]
Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. We propose a novel mask-aware inpainting solution that learns multi-scale features for missing regions in the encoding phase. Our framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets.
arXiv Detail & Related papers (2021-04-28T13:17:47Z)
Investigating and Simplifying Masking-based Saliency Methods for Model Interpretability [5.387323728379395]
Saliency maps that identify the most informative regions of an image are valuable for model interpretability. A common approach to creating saliency maps involves generating input masks that mask out portions of an image. We show that a masking model can be trained with as few as 10 examples per class and still generate saliency maps with only a 0.7-point increase in localization error.
arXiv Detail & Related papers (2020-10-19T18:00:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.