Related papers: No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

URL: http://arxiv.org/abs/2208.03641v1
Date: Sun, 7 Aug 2022 05:09:18 GMT
Title: No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects
Authors: Raja Sunkara and Tie Luo
Abstract summary: Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. We propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer.
Score: 3.096615629099617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.

Related papers

Squeeze-and-Remember Block [4.150676163661315]
"Squeeze-and-Remember" (SR) block is a novel architectural unit that gives CNNs dynamic memory-like functionalities. SR block selectively memorizes important features during training, and then adaptively re-applies these features during inference. This improves the network's ability to make contextually informed predictions.
arXiv Detail & Related papers (2024-10-01T16:06:31Z)
Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer [0.0]
We introduce Fast&Focused-Net, a novel deep neural network architecture tailored for encoding small objects into fixed-length feature vectors. Fast&Focused-Net employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs. For small object classification tasks, our network outperformed state-of-the-art methods on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST. In the context of larger image classification, when combined with a transformer encoder (ViT
arXiv Detail & Related papers (2024-01-18T09:31:25Z)
T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z)
LR-Net: A Block-based Convolutional Neural Network for Low-Resolution Image Classification [0.0]
We develop a novel image classification architecture, composed of blocks that are designed to learn both low level and global features from noisy and low-resolution images. Our design of the blocks was heavily influenced by Residual Connections and Inception modules in order to increase performance and reduce parameter sizes. We have performed in-depth tests that demonstrate the presented architecture is faster and more accurate than existing cutting-edge convolutional neural networks.
arXiv Detail & Related papers (2022-07-19T20:01:11Z)
Towards a General Purpose CNN for Long Range Dependencies in $\mathrm{N}$D [49.57261544331683]
We propose a single CNN architecture equipped with continuous convolutional kernels for tasks on arbitrary resolution, dimensionality and length without structural changes. We show the generality of our approach by applying the same CCNN to a wide set of tasks on sequential (1$mathrmD$) and visual data (2$mathrmD$) Our CCNN performs competitively and often outperforms the current state-of-the-art across all tasks considered.
arXiv Detail & Related papers (2022-06-07T15:48:02Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
An Evolution of CNN Object Classifiers on Low-Resolution Images [0.4129225533930965]
Object classification from low-quality images is difficult for the variance of object colors, aspect ratios, and cluttered backgrounds. Deep convolutional neural networks (DCNNs) have been demonstrated as very powerful systems for facing the challenge of object classification from high-resolution images. In this paper, we investigate an optimal architecture that accurately classifies low-quality images using DCNNs architectures.
arXiv Detail & Related papers (2021-01-03T18:44:23Z)
Improved Residual Networks for Image and Video Recognition [98.10703825716142]
Residual networks (ResNets) represent a powerful type of convolutional neural network (CNN) architecture. We show consistent improvements in accuracy and learning convergence over the baseline. Our proposed approach allows us to train extremely deep networks, while the baseline shows severe optimization issues.
arXiv Detail & Related papers (2020-04-10T11:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.