CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object
Localization Perspective
- URL: http://arxiv.org/abs/2403.06676v1
- Date: Mon, 11 Mar 2024 12:48:22 GMT
- Title: CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object
Localization Perspective
- Authors: Shunsuke Yasuki, Masato Taki
- Abstract summary: Large kernel CNNs have been reported to perform well in downstream vision tasks as well as in classification performance.
We revisit the performance of large kernel CNNs in downstream task, focusing on the weakly supervised object localization task.
Our study compares the modern large kernel CNNs ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that ERF size is important for improving downstream task performance.
- Score: 2.7195102129095003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, convolutional neural networks (CNNs) with large size kernels have
attracted much attention in the computer vision field, following the success of
the Vision Transformers. Large kernel CNNs have been reported to perform well
in downstream vision tasks as well as in classification performance. The reason
for the high-performance of large kernel CNNs in downstream tasks has been
attributed to the large effective receptive field (ERF) produced by large size
kernels, but this view has not been fully tested. We therefore revisit the
performance of large kernel CNNs in downstream task, focusing on the weakly
supervised object localization (WSOL) task. WSOL, a difficult downstream task
that is not fully supervised, provides a new angle to explore the capabilities
of the large kernel CNNs. Our study compares the modern large kernel CNNs
ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that
ERF size is important for improving downstream task performance. Our analysis
of the factors contributing to high performance provides a different
perspective, in which the main factor is feature map improvement. Furthermore,
we find that modern CNNs are robust to the CAM problems of local regions of
objects being activated, which has long been discussed in WSOL. CAM is the most
classic WSOL method, but because of the above-mentioned problems, it is often
used as a baseline method for comparison. However, experiments on the
CUB-200-2011 dataset show that simply combining a large kernel CNN, CAM, and
simple data augmentation methods can achieve performance (90.99% MaxBoxAcc)
comparable to the latest WSOL method, which is CNN-based and requires special
training or complex post-processing. The code is available at
https://github.com/snskysk/CAM-Back-Again.
Related papers
- OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects [8.933264104073832]
Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes.
We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism.
On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T13:13:45Z) - Transferability of Convolutional Neural Networks in Stationary Learning
Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining.
Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z) - InternImage: Exploring Large-Scale Vision Foundation Models with
Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs.
The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z) - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs)
Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm.
We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z) - A Novel Sleep Stage Classification Using CNN Generated by an Efficient
Neural Architecture Search with a New Data Processing Trick [4.365107026636095]
We propose an efficient five-sleep-consuming classification method using convolutional neural networks (CNNs) with a novel data processing trick.
We make full use of genetic algorithm (GA), NASG, to search for the best CNN architecture.
We verify convergence of our data processing trick and compare the performance of traditional CNNs before and after using our trick.
arXiv Detail & Related papers (2021-10-27T10:36:52Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Spectral Leakage and Rethinking the Kernel Size in CNNs [10.432041176720842]
We show that the small size of CNN kernels make them susceptible to spectral leakage.
We demonstrate improved classification accuracy over baselines with conventional $3times 3$ kernels.
We also show that CNNs employing the Hamming window display increased robustness against certain types of adversarial attacks.
arXiv Detail & Related papers (2021-01-25T14:49:29Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.