RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
- URL: http://arxiv.org/abs/2002.11921v2
- Date: Thu, 22 Oct 2020 21:05:24 GMT
- Title: RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
- Authors: Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma,
Prateek Jain
- Abstract summary: We introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs)
An RNNPool layer can effectively replace multiple blocks in a variety of architectures like MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection.
We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM.
- Score: 24.351577383531616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard Convolutional Neural Networks (CNNs) designed for computer vision
tasks tend to have large intermediate activation maps. These require large
working memory and are thus unsuitable for deployment on resource-constrained
devices typically used for inference on the edge. Aggressively downsampling the
images via pooling or strided convolutions can address the problem but leads to
a significant decrease in accuracy due to gross aggregation of the feature map
by standard pooling operators. In this paper, we introduce RNNPool, a novel
pooling operator based on Recurrent Neural Networks (RNNs), that efficiently
aggregates features over large patches of an image and rapidly downsamples
activation maps. Empirical evaluation indicates that an RNNPool layer can
effectively replace multiple blocks in a variety of architectures such as
MobileNets, DenseNet when applied to standard vision tasks like image
classification and face detection. That is, RNNPool can significantly decrease
computational complexity and peak memory usage for inference while retaining
comparable accuracy. We use RNNPool with the standard S3FD architecture to
construct a face detection method that achieves state-of-the-art MAP for tiny
ARM Cortex-M4 class microcontrollers with under 256 KB of RAM. Code is released
at https://github.com/Microsoft/EdgeML.
Related papers
- Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation [7.539498729072623]
Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure.
Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG.
This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks.
arXiv Detail & Related papers (2023-06-29T05:49:07Z) - Pooling Revisited: Your Receptive Field is Suboptimal [35.11562214480459]
The size and shape of the receptive field determine how the network aggregates local information.
We propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool.
Our experiments show that the models equipped with the proposed learnable resizing module outperform the baseline networks on multiple datasets in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-30T17:03:40Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - AdaPool: Exponential Adaptive Pooling for Information-Retaining
Downsampling [82.08631594071656]
Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs)
We propose an adaptive and exponentially weighted pooling method named adaPool.
We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection.
arXiv Detail & Related papers (2021-11-01T08:50:37Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Refining activation downsampling with SoftPool [74.1840492087968]
Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps.
We propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling.
We show that SoftPool can retain more information in the reduced activation maps.
arXiv Detail & Related papers (2021-01-02T12:09:49Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.