Efficient CNN-LSTM based Image Captioning using Neural Network
Compression
- URL: http://arxiv.org/abs/2012.09708v1
- Date: Thu, 17 Dec 2020 16:25:09 GMT
- Title: Efficient CNN-LSTM based Image Captioning using Neural Network
Compression
- Authors: Harshit Rampal, Aman Mohanty
- Abstract summary: We present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model.
We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern Neural Networks are eminent in achieving state of the art performance
on tasks under Computer Vision, Natural Language Processing and related
verticals. However, they are notorious for their voracious memory and compute
appetite which further obstructs their deployment on resource limited edge
devices. In order to achieve edge deployment, researchers have developed
pruning and quantization algorithms to compress such networks without
compromising their efficacy. Such compression algorithms are broadly
experimented on standalone CNN and RNN architectures while in this work, we
present an unconventional end to end compression pipeline of a CNN-LSTM based
Image Captioning model. The model is trained using VGG16 or ResNet50 as an
encoder and an LSTM decoder on the flickr8k dataset. We then examine the
effects of different compression architectures on the model and design a
compression architecture that achieves a 73.1% reduction in model size, 71.3%
reduction in inference time and a 7.7% increase in BLEU score as compared to
its uncompressed counterpart.
Related papers
- Computer Vision Model Compression Techniques for Embedded Systems: A Survey [75.38606213726906]
This paper covers the main model compression techniques applied for computer vision tasks.
We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique.
We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges.
arXiv Detail & Related papers (2024-08-15T16:41:55Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Exploring Structural Sparsity in Neural Image Compression [14.106763725475469]
We propose a plug-in adaptive binary channel masking(ABCM) to judge the importance of each convolution channel and introduce sparsity during training.
During inference, the unimportant channels are pruned to obtain slimmer network and less computation.
Experiment results show that up to 7x computation reduction and 3x acceleration can be achieved with negligible performance drop.
arXiv Detail & Related papers (2022-02-09T17:46:49Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Toward Compact Parameter Representations for Architecture-Agnostic
Neural Network Compression [26.501979992447605]
This paper investigates compression from the perspective of compactly representing and storing trained parameters.
We leverage additive quantization, an extreme lossy compression method invented for image descriptors, to compactly represent the parameters.
We conduct experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid Networks, and pruned DNNs trained for classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2021-11-19T17:03:11Z) - Joint Matrix Decomposition for Deep Convolutional Neural Networks
Compression [5.083621265568845]
Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources.
Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years.
We propose to compress CNNs and alleviate performance degradation via joint matrix decomposition.
arXiv Detail & Related papers (2021-07-09T12:32:10Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - On the Impact of Lossy Image and Video Compression on the Performance of
Deep Convolutional Neural Network Architectures [17.349420462716886]
This study investigates the impact of commonplace image and video compression techniques on the performance of deep learning architectures.
We examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation.
Results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied.
arXiv Detail & Related papers (2020-07-28T15:37:37Z) - Compression strategies and space-conscious representations for deep
neural networks [0.3670422696827526]
Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications.
CNNs have millions of parameters, thus they are not deployable on resource-limited platforms.
In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
arXiv Detail & Related papers (2020-07-15T19:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.