F-CAM: Full Resolution CAM via Guided Parametric Upscaling
- URL: http://arxiv.org/abs/2109.07069v1
- Date: Wed, 15 Sep 2021 04:45:20 GMT
- Title: F-CAM: Full Resolution CAM via Guided Parametric Upscaling
- Authors: Soufiane Belharbi, Aydin Sarraf, Marco Pedersoli, Ismail Ben Ayed,
Luke McCaffrey, Eric Granger
- Abstract summary: Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks.
CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50.
We introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs.
- Score: 20.609010268320013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class Activation Mapping (CAM) methods have recently gained much attention
for weakly-supervised object localization (WSOL) tasks, allowing for CNN
visualization and interpretation without training on fully annotated image
datasets. CAM methods are typically integrated within off-the-shelf CNN
backbones, such as ResNet50. Due to convolution and downsampling/pooling
operations, these backbones yield low resolution CAMs with a down-scaling
factor of up to 32, making accurate localization more difficult. Interpolation
is required to restore a full size CAMs, but without considering the
statistical properties of the objects, leading to activations with inconsistent
boundaries and inaccurate localizations. As an alternative, we introduce a
generic method for parametric upscaling of CAMs that allows constructing
accurate full resolution CAMs (F-CAMs). In particular, we propose a trainable
decoding architecture that can be connected to any CNN classifier to produce
more accurate CAMs. Given an original (low resolution) CAM, foreground and
background pixels are randomly sampled for fine-tuning the decoder. Additional
priors such as image statistics, and size constraints are also considered to
expand and refine object boundaries. Extensive experiments using three CNN
backbones and six WSOL baselines on the CUB-200-2011 and OpenImages datasets,
indicate that our F-CAM method yields a significant improvement in CAM
localization accuracy. F-CAM performance is competitive with state-of-art WSOL
methods, yet it requires fewer computational resources during inference.
Related papers
- Generalizing GradCAM for Embedding Networks [0.0]
We present a new method EmbeddingCAM, which generalizes the Grad-CAM for embedding networks.
We show the effectiveness of our method on CUB-200-2011 dataset and also present quantitative and qualitative analysis on the dataset.
arXiv Detail & Related papers (2024-02-01T04:58:06Z) - BroadCAM: Outcome-agnostic Class Activation Mapping for Small-scale
Weakly Supervised Applications [69.22739434619531]
We propose an outcome-agnostic CAM approach, called BroadCAM, for small-scale weakly supervised applications.
By evaluating BroadCAM on VOC2012 and BCSS-WSSS for WSSS and OpenImages30k for WSOL, BroadCAM demonstrates superior performance.
arXiv Detail & Related papers (2023-09-07T06:45:43Z) - TCAM: Temporal Class Activation Maps for Object Localization in
Weakly-Labeled Unconstrained Videos [22.271760669551817]
Weakly supervised object localization (WSVOL) allows object locating in videos using only global video tags as such object class.
In this paper, we leverage the successful class activation mapping (CAM) methods, designed for WSOL based on still images.
A new Temporal CAM (TCAM) method is introduced to train ariminant deep learning (DL) model to exploittemporal information in videos.
arXiv Detail & Related papers (2022-08-30T21:20:34Z) - Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation [88.55040177178442]
Class activation maps (CAM) is arguably the most standard step of generating pseudo masks for semantic segmentation.
Yet, the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM.
We introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE)
The evaluation on both PASCAL VOC and MSCOCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.
arXiv Detail & Related papers (2022-03-02T09:14:58Z) - PCAM: Product of Cross-Attention Matrices for Rigid Registration of
Point Clouds [79.99653758293277]
PCAM is a neural network whose key element is a pointwise product of cross-attention matrices.
We show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
arXiv Detail & Related papers (2021-10-04T09:23:27Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Use HiResCAM instead of Grad-CAM for faithful explanations of
convolutional neural networks [89.56292219019163]
Explanation methods facilitate the development of models that learn meaningful concepts and avoid exploiting spurious correlations.
We illustrate a previously unrecognized limitation of the popular neural network explanation method Grad-CAM.
We propose HiResCAM, a class-specific explanation method that is guaranteed to highlight only the locations the model used to make each prediction.
arXiv Detail & Related papers (2020-11-17T19:26:14Z) - High resolution weakly supervised localization architectures for medical
images [3.7117844677482146]
We propose a model for high-accuracy weakly-supervised localization that achieved 0.62 average point localization accuracy on NIH's Chest X-Ray 14 dataset.
Our experiments suggest that Global Average Pooling (GAP) and Group Normalization are the main culprits that worsen the localization accuracy of CAM.
arXiv Detail & Related papers (2020-10-22T06:42:00Z) - IS-CAM: Integrated Score-CAM for axiomatic-based explanations [0.0]
We propose IS-CAM (Integrated Score-CAM), where we introduce the integration operation within the Score-CAM pipeline to achieve visually sharper attribution maps.
Our method is evaluated on 2000 randomly selected images from the ILSVRC 2012 Validation dataset, which proves the versatility of IS-CAM to account for different models and methods.
arXiv Detail & Related papers (2020-10-06T21:03:03Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.