Refining activation downsampling with SoftPool
- URL: http://arxiv.org/abs/2101.00440v3
- Date: Thu, 18 Mar 2021 09:51:42 GMT
- Title: Refining activation downsampling with SoftPool
- Authors: Alexandros Stergiou, Ronald Poppe, Grigorios Kalliatakis
- Abstract summary: Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps.
We propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling.
We show that SoftPool can retain more information in the reduced activation maps.
- Score: 74.1840492087968
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional Neural Networks (CNNs) use pooling to decrease the size of
activation maps. This process is crucial to increase the receptive fields and
to reduce computational requirements of subsequent convolutions. An important
feature of the pooling operation is the minimization of information loss, with
respect to the initial activation maps, without a significant impact on the
computation and memory overhead. To meet these requirements, we propose
SoftPool: a fast and efficient method for exponentially weighted activation
downsampling. Through experiments across a range of architectures and pooling
methods, we demonstrate that SoftPool can retain more information in the
reduced activation maps. This refined downsampling leads to improvements in a
CNN's classification accuracy. Experiments with pooling layer substitutions on
ImageNet1K show an increase in accuracy over both original architectures and
other pooling methods. We also test SoftPool on video datasets for action
recognition. Again, through the direct replacement of pooling layers, we
observe consistent performance improvements while computational loads and
memory requirements remain limited.
Related papers
- WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Self-Attentive Pooling for Efficient Deep Learning [6.822466048176652]
We propose a novel non-local self-attentive pooling method that can be used as a drop-in replacement to the standard pooling layers.
We surpass the test accuracy of existing pooling techniques on different variants of MobileNet-V2 on ImageNet by an average of 1.2%.
Our approach achieves 1.43% higher test accuracy compared to SOTA techniques with iso-memory footprints.
arXiv Detail & Related papers (2022-09-16T00:35:14Z) - Pooling Revisited: Your Receptive Field is Suboptimal [35.11562214480459]
The size and shape of the receptive field determine how the network aggregates local information.
We propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool.
Our experiments show that the models equipped with the proposed learnable resizing module outperform the baseline networks on multiple datasets in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-30T17:03:40Z) - AdaPool: Exponential Adaptive Pooling for Information-Retaining
Downsampling [82.08631594071656]
Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs)
We propose an adaptive and exponentially weighted pooling method named adaPool.
We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection.
arXiv Detail & Related papers (2021-11-01T08:50:37Z) - Token Pooling in Vision Transformers [37.11990688046186]
In vision transformers, self-attention is not the major bottleneck, e.g., more than 80% of the computation is spent on fully-connected layers.
We propose a novel token downsampling method, called Token Pooling, efficiently exploiting redundancies in the images and intermediate token representations.
Our experiments show that Token Pooling significantly improves the cost-accuracy trade-off over the state-of-the-art downsampling.
arXiv Detail & Related papers (2021-10-08T02:22:50Z) - Temporal Attention-Augmented Graph Convolutional Network for Efficient
Skeleton-Based Human Action Recognition [97.14064057840089]
Graphal networks (GCNs) have been very successful in modeling non-Euclidean data structures.
Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action.
We propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition.
arXiv Detail & Related papers (2020-10-23T08:01:55Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference [24.351577383531616]
We introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs)
An RNNPool layer can effectively replace multiple blocks in a variety of architectures like MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection.
We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM.
arXiv Detail & Related papers (2020-02-27T05:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.