Related papers: CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective

URL: http://arxiv.org/abs/2403.06676v1
Date: Mon, 11 Mar 2024 12:48:22 GMT
Title: CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Authors: Shunsuke Yasuki, Masato Taki
Abstract summary: Large kernel CNNs have been reported to perform well in downstream vision tasks as well as in classification performance. We revisit the performance of large kernel CNNs in downstream task, focusing on the weakly supervised object localization task. Our study compares the modern large kernel CNNs ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that ERF size is important for improving downstream task performance.
Score: 2.7195102129095003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, convolutional neural networks (CNNs) with large size kernels have attracted much attention in the computer vision field, following the success of the Vision Transformers. Large kernel CNNs have been reported to perform well in downstream vision tasks as well as in classification performance. The reason for the high-performance of large kernel CNNs in downstream tasks has been attributed to the large effective receptive field (ERF) produced by large size kernels, but this view has not been fully tested. We therefore revisit the performance of large kernel CNNs in downstream task, focusing on the weakly supervised object localization (WSOL) task. WSOL, a difficult downstream task that is not fully supervised, provides a new angle to explore the capabilities of the large kernel CNNs. Our study compares the modern large kernel CNNs ConvNeXt, RepLKNet, and SLaK to test the validity of the naive expectation that ERF size is important for improving downstream task performance. Our analysis of the factors contributing to high performance provides a different perspective, in which the main factor is feature map improvement. Furthermore, we find that modern CNNs are robust to the CAM problems of local regions of objects being activated, which has long been discussed in WSOL. CAM is the most classic WSOL method, but because of the above-mentioned problems, it is often used as a baseline method for comparison. However, experiments on the CUB-200-2011 dataset show that simply combining a large kernel CNN, CAM, and simple data augmentation methods can achieve performance (90.99% MaxBoxAcc) comparable to the latest WSOL method, which is CNN-based and requires special training or complex post-processing. The code is available at https://github.com/snskysk/CAM-Back-Again.

Related papers

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
Shift-ConvNets: Small Convolutional Kernel with Large Kernel Effects [8.933264104073832]
Small convolutional kernels and convolution operations can achieve the closing effects of large kernel sizes. We propose a shift-wise operator that ensures the CNNs capture long-range dependencies with the help of the sparse mechanism. On the ImageNet-1k, our shift-wise enhanced CNN model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2024-01-23T13:13:45Z)
Transferability of Convolutional Neural Networks in Stationary Learning Tasks [96.00428692404354]
We introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
arXiv Detail & Related papers (2023-07-21T13:51:45Z)
Scaling Spherical CNNs [19.735829027026902]
We show how spherical convolutions can be scaled for much larger problems. Experiments show our larger spherical CNNs reach state-of-the-art on several targets of the QM9 molecular benchmark.
arXiv Detail & Related papers (2023-06-08T17:59:08Z)
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions [95.94629864981091]
This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. The proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs.
arXiv Detail & Related papers (2022-11-10T18:59:04Z)
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [148.0476219278875]
We revisit large kernel design in modern convolutional neural networks (CNNs) Inspired by recent advances of vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3.
arXiv Detail & Related papers (2022-03-13T17:22:44Z)
A Novel Sleep Stage Classification Using CNN Generated by an Efficient Neural Architecture Search with a New Data Processing Trick [4.365107026636095]
We propose an efficient five-sleep-consuming classification method using convolutional neural networks (CNNs) with a novel data processing trick. We make full use of genetic algorithm (GA), NASG, to search for the best CNN architecture. We verify convergence of our data processing trick and compare the performance of traditional CNNs before and after using our trick.
arXiv Detail & Related papers (2021-10-27T10:36:52Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
Spectral Leakage and Rethinking the Kernel Size in CNNs [10.432041176720842]
We show that the small size of CNN kernels make them susceptible to spectral leakage. We demonstrate improved classification accuracy over baselines with conventional $3times 3$ kernels. We also show that CNNs employing the Hamming window display increased robustness against certain types of adversarial attacks.
arXiv Detail & Related papers (2021-01-25T14:49:29Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.