Related papers: Condensation-Net: Memory-Efficient Network Architecture with Cross-Channel Pooling Layers and Virtual Feature Maps

Condensation-Net: Memory-Efficient Network Architecture with Cross-Channel Pooling Layers and Virtual Feature Maps

URL: http://arxiv.org/abs/2104.14124v1
Date: Thu, 29 Apr 2021 05:44:02 GMT
Title: Condensation-Net: Memory-Efficient Network Architecture with Cross-Channel Pooling Layers and Virtual Feature Maps
Authors: Tse-Wei Chen, Motoki Yoshinaga, Hongxing Gao, Wei Tao, Dongchao Wen, Junjie Liu, Kinya Osa, Masami Kato
Abstract summary: We propose an algorithm to process a specific network architecture (Condensation-Net) without increasing the maximum memory storage for feature maps. Cross-channel pooling can improve the accuracy of object detection tasks, such as face detection. The overhead to support the cross-channel pooling with the proposed hardware architecture is negligible small.
Score: 28.992851280809205
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: "Lightweight convolutional neural networks" is an important research topic in the field of embedded vision. To implement image recognition tasks on a resource-limited hardware platform, it is necessary to reduce the memory size and the computational cost. The contribution of this paper is stated as follows. First, we propose an algorithm to process a specific network architecture (Condensation-Net) without increasing the maximum memory storage for feature maps. The architecture for virtual feature maps saves 26.5% of memory bandwidth by calculating the results of cross-channel pooling before storing the feature map into the memory. Second, we show that cross-channel pooling can improve the accuracy of object detection tasks, such as face detection, because it increases the number of filter weights. Compared with Tiny-YOLOv2, the improvement of accuracy is 2.0% for quantized networks and 1.5% for full-precision networks when the false-positive rate is 0.1. Last but not the least, the analysis results show that the overhead to support the cross-channel pooling with the proposed hardware architecture is negligible small. The extra memory cost to support Condensation-Net is 0.2% of the total size, and the extra gate count is only 1.0% of the total size.

Related papers

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks [2.9835839258066015]
We introduce a memory-efficient CNN (convolutional neural network) for on-device vision tasks. The proposed network classifies ImageNet with extremely low memory (i.e., 63 KB) while achieving competitive top-1 accuracy (i.e., 61.58%) To the best of our knowledge, the memory usage of the proposed network is far smaller than state-of-the-art memory-efficient networks.
arXiv Detail & Related papers (2024-08-07T10:04:04Z)
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings [113.38884267189871]
Training and inference on edge devices often requires an efficient setup due to computational limitations. Pre-computing data representations and caching them on a server can mitigate extensive edge device computation. We propose a simple, yet effective approach that uses randomly hyperplane projections. We show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
arXiv Detail & Related papers (2023-03-13T10:53:00Z)
Self-Attentive Pooling for Efficient Deep Learning [6.822466048176652]
We propose a novel non-local self-attentive pooling method that can be used as a drop-in replacement to the standard pooling layers. We surpass the test accuracy of existing pooling techniques on different variants of MobileNet-V2 on ImageNet by an average of 1.2%. Our approach achieves 1.43% higher test accuracy compared to SOTA techniques with iso-memory footprints.
arXiv Detail & Related papers (2022-09-16T00:35:14Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures. We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels. Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z)
PocketNet: A Smaller Neural Network for 3D Medical Image Segmentation [0.0]
We derive a new CNN architecture called PocketNet that achieves comparable segmentation results to conventional CNNs while using less than 3% of the number of parameters. We show that PocketNet achieves comparable segmentation results to conventional CNNs while using less than 3% of the number of parameters.
arXiv Detail & Related papers (2021-04-21T20:10:30Z)
Refining activation downsampling with SoftPool [74.1840492087968]
Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. We propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling. We show that SoftPool can retain more information in the reduced activation maps.
arXiv Detail & Related papers (2021-01-02T12:09:49Z)
Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage. However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures. We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.