On the Impact of Lossy Image and Video Compression on the Performance of
Deep Convolutional Neural Network Architectures
- URL: http://arxiv.org/abs/2007.14314v1
- Date: Tue, 28 Jul 2020 15:37:37 GMT
- Title: On the Impact of Lossy Image and Video Compression on the Performance of
Deep Convolutional Neural Network Architectures
- Authors: Matt Poyser, Amir Atapour-Abarghouei, Toby P. Breckon
- Abstract summary: This study investigates the impact of commonplace image and video compression techniques on the performance of deep learning architectures.
We examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation.
Results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied.
- Score: 17.349420462716886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in generalized image understanding have seen a surge in the
use of deep convolutional neural networks (CNN) across a broad range of
image-based detection, classification and prediction tasks. Whilst the reported
performance of these approaches is impressive, this study investigates the
hitherto unapproached question of the impact of commonplace image and video
compression techniques on the performance of such deep learning architectures.
Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for
contemporary lossy image/video compression techniques that are in common use
within network-connected image/video devices and infrastructure, we examine the
impact on performance across five discrete tasks: human pose estimation,
semantic segmentation, object detection, action recognition, and monocular
depth estimation. As such, within this study we include a variety of network
architectures and domains spanning end-to-end convolution, encoder-decoder,
region-based CNN (R-CNN), dual-stream, and generative adversarial networks
(GAN). Our results show a non-linear and non-uniform relationship between
network performance and the level of lossy compression applied. Notably,
performance decreases significantly below a JPEG quality (quantization) level
of 15% and a H.264 Constant Rate Factor (CRF) of 40. However, retraining said
architectures on pre-compressed imagery conversely recovers network performance
by up to 78.4% in some cases. Furthermore, there is a correlation between
architectures employing an encoder-decoder pipeline and those that demonstrate
resilience to lossy image compression. The characteristics of the relationship
between input compression to output task performance can be used to inform
design decisions within future image/video devices and infrastructure.
Related papers
- Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural
Image Compression [62.888755394395716]
This study presents an enhanced neural compression method designed for optimal visual fidelity.
We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss.
Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z) - Transferable Learned Image Compression-Resistant Adversarial
Perturbations [69.79762292033553]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks.
We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z) - Analysis of the Effect of Low-Overhead Lossy Image Compression on the
Performance of Visual Crowd Counting for Smart City Applications [78.55896581882595]
Lossy image compression techniques can reduce the quality of the images, leading to accuracy degradation.
In this paper, we analyze the effect of applying low-overhead lossy image compression methods on the accuracy of visual crowd counting.
arXiv Detail & Related papers (2022-07-20T19:20:03Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Exploring Structural Sparsity in Neural Image Compression [14.106763725475469]
We propose a plug-in adaptive binary channel masking(ABCM) to judge the importance of each convolution channel and introduce sparsity during training.
During inference, the unimportant channels are pruned to obtain slimmer network and less computation.
Experiment results show that up to 7x computation reduction and 3x acceleration can be achieved with negligible performance drop.
arXiv Detail & Related papers (2022-02-09T17:46:49Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Operationalizing Convolutional Neural Network Architectures for
Prohibited Object Detection in X-Ray Imagery [15.694880385913534]
We explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection.
With fewer parameters and less training time, FreeAnchor achieves the highest detection inference speed of 13 fps (3.9 ms per image)
The CNN models display substantial resilience to the lossy compression, resulting in only a 1.1% decrease in mAP at the JPEG compression level of 50.
arXiv Detail & Related papers (2021-10-10T21:20:04Z) - NeighCNN: A CNN based SAR Speckle Reduction using Feature preserving
Loss Function [1.7188280334580193]
NeighCNN is a deep learning-based speckle reduction algorithm that handles multiplicative noise.
Various synthetic, as well as real SAR images, are used for testing the NeighCNN architecture.
arXiv Detail & Related papers (2021-08-26T04:20:07Z) - Generic Perceptual Loss for Modeling Structured Output Dependencies [78.59700528239141]
We show that, what matters is the network structure instead of the trained weights.
We demonstrate that a randomly-weighted deep CNN can be used to model the structured dependencies of outputs.
arXiv Detail & Related papers (2021-03-18T23:56:07Z) - Efficient CNN-LSTM based Image Captioning using Neural Network
Compression [0.0]
We present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model.
We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size.
arXiv Detail & Related papers (2020-12-17T16:25:09Z) - End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous
Residual Convolutional Neural Network [0.0]
Existing deep learning models separate JPEG artifacts suppression from the decoding protocol as independent task.
We take one step forward to design a true end-to-end heterogeneous residual convolutional neural network (HR-CNN) with spectrum decomposition and heterogeneous reconstruction mechanism.
arXiv Detail & Related papers (2020-07-01T17:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.