Self-Attentive Pooling for Efficient Deep Learning
- URL: http://arxiv.org/abs/2209.07659v2
- Date: Mon, 19 Sep 2022 03:53:41 GMT
- Title: Self-Attentive Pooling for Efficient Deep Learning
- Authors: Fang Chen, Gourav Datta, Souvik Kundu, Peter Beerel
- Abstract summary: We propose a novel non-local self-attentive pooling method that can be used as a drop-in replacement to the standard pooling layers.
We surpass the test accuracy of existing pooling techniques on different variants of MobileNet-V2 on ImageNet by an average of 1.2%.
Our approach achieves 1.43% higher test accuracy compared to SOTA techniques with iso-memory footprints.
- Score: 6.822466048176652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient custom pooling techniques that can aggressively trim the dimensions
of a feature map and thereby reduce inference compute and memory footprint for
resource-constrained computer vision applications have recently gained
significant traction. However, prior pooling works extract only the local
context of the activation maps, limiting their effectiveness. In contrast, we
propose a novel non-local self-attentive pooling method that can be used as a
drop-in replacement to the standard pooling layers, such as max/average pooling
or strided convolution. The proposed self-attention module uses patch
embedding, multi-head self-attention, and spatial-channel restoration, followed
by sigmoid activation and exponential soft-max. This self-attention mechanism
efficiently aggregates dependencies between non-local activation patches during
down-sampling. Extensive experiments on standard object classification and
detection tasks with various convolutional neural network (CNN) architectures
demonstrate the superiority of our proposed mechanism over the state-of-the-art
(SOTA) pooling techniques. In particular, we surpass the test accuracy of
existing pooling techniques on different variants of MobileNet-V2 on ImageNet
by an average of 1.2%. With the aggressive down-sampling of the activation maps
in the initial layers (providing up to 22x reduction in memory consumption),
our approach achieves 1.43% higher test accuracy compared to SOTA techniques
with iso-memory footprints. This enables the deployment of our models in
memory-constrained devices, such as micro-controllers (without losing
significant accuracy), because the initial activation maps consume a
significant amount of on-chip memory for high-resolution images required for
complex vision tasks. Our proposed pooling method also leverages the idea of
channel pruning to further reduce memory footprints.
Related papers
- Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - TinyAD: Memory-efficient anomaly detection for time series data in
Industrial IoT [43.207210990362825]
We propose a novel framework named Tiny Anomaly Detection (TinyAD) to efficiently facilitate onboard inference of CNNs for real-time anomaly detection.
To reduce the peak memory consumption of CNNs, we explore two complementary strategies, in-place, and patch-by-patch memory rescheduling.
Our framework can reduce peak memory consumption by 2-5x with negligible overhead.
arXiv Detail & Related papers (2023-03-07T02:56:15Z) - EcoTTA: Memory-Efficient Continual Test-time Adaptation via
Self-distilled Regularization [71.70414291057332]
TTA may primarily be conducted on edge devices with limited memory.
Long-term adaptation often leads to catastrophic forgetting and error accumulation.
We present lightweight meta networks that can adapt the frozen original networks to the target domain.
arXiv Detail & Related papers (2023-03-03T13:05:30Z) - Distributed Pruning Towards Tiny Neural Networks in Federated Learning [12.63559789381064]
FedTiny is a distributed pruning framework for federated learning.
It generates specialized tiny models for memory- and computing-constrained devices.
It achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91%.
arXiv Detail & Related papers (2022-12-05T01:58:45Z) - Learnable Mixed-precision and Dimension Reduction Co-design for
Low-storage Activation [9.838135675969026]
Deep convolutional neural networks (CNNs) have achieved many eye-catching results.
deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference.
We propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates compression policies according to their importance.
arXiv Detail & Related papers (2022-07-16T12:53:52Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Refining activation downsampling with SoftPool [74.1840492087968]
Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps.
We propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling.
We show that SoftPool can retain more information in the reduced activation maps.
arXiv Detail & Related papers (2021-01-02T12:09:49Z) - A Variational Information Bottleneck Based Method to Compress Sequential
Networks for Human Action Recognition [9.414818018857316]
We propose a method to effectively compress Recurrent Neural Networks (RNNs) used for Human Action Recognition (HAR)
We use a Variational Information Bottleneck (VIB) theory-based pruning approach to limit the information flow through the sequential cells of RNNs to a small subset.
We combine our pruning method with a specific group-lasso regularization technique that significantly improves compression.
It is shown that our method achieves over 70 times greater compression than the nearest competitor with comparable accuracy for the task of action recognition on UCF11.
arXiv Detail & Related papers (2020-10-03T12:41:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.