CNNs for JPEGs: A Study in Computational Cost
- URL: http://arxiv.org/abs/2309.11417v2
- Date: Fri, 22 Sep 2023 19:40:11 GMT
- Title: CNNs for JPEGs: A Study in Computational Cost
- Authors: Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida
- Abstract summary: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade.
CNNs are capable of learning robust representations of the data directly from the RGB pixels.
Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
- Score: 45.74830585715129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) have achieved astonishing advances over
the past decade, defining state-of-the-art in several computer vision tasks.
CNNs are capable of learning robust representations of the data directly from
the RGB pixels. However, most image data are usually available in compressed
format, from which the JPEG is the most widely used due to transmission and
storage purposes demanding a preliminary decoding process that have a high
computational load and memory usage. For this reason, deep learning methods
capable of learning directly from the compressed domain have been gaining
attention in recent years. Those methods usually extract a frequency domain
representation of the image, like DCT, by a partial decoding, and then make
adaptation to typical CNNs architectures to work with them. One limitation of
these current works is that, in order to accommodate the frequency domain data,
the modifications made to the original model increase significantly their
amount of parameters and computational complexity. On one hand, the methods
have faster preprocessing, since the cost of fully decoding the images is
avoided, but on the other hand, the cost of passing the images though the model
is increased, mitigating the possible upside of accelerating the method. In
this paper, we propose a further study of the computational cost of deep models
designed for the frequency domain, evaluating the cost of decoding and passing
the images through the network. We also propose handcrafted and data-driven
techniques for reducing the computational complexity and the number of
parameters for these models in order to keep them similar to their RGB
baselines, leading to efficient models with a better trade off between
computational cost and accuracy.
Related papers
- DCT-CryptoNets: Scaling Private Inference in the Frequency Domain [8.084341432899954]
homomorphic encryption (FHE) and machine learning offer unprecedented opportunities for private inference of sensitive data.
FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality.
Existing FHE-based implementations for deep neural networks face challenges in computational cost, latency, and scalability.
This paper introduces DCT-CryptoNets, a novel approach that leverages frequency-domain learning to tackle these issues.
arXiv Detail & Related papers (2024-08-27T17:48:29Z) - Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation [7.539498729072623]
Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure.
Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG.
This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks.
arXiv Detail & Related papers (2023-06-29T05:49:07Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Improved FRQI on superconducting processors and its restrictions in the
NISQ era [62.997667081978825]
We study the feasibility of the Flexible Representation of Quantum Images (FRQI)
We also check experimentally what is the limit in the current noisy intermediate-scale quantum era.
We propose a method for simplifying the circuits needed for the FRQI.
arXiv Detail & Related papers (2021-10-29T10:42:43Z) - Less is More: Accelerating Faster Neural Networks Straight from JPEG [1.9214041945441434]
We show how to speed up convolutional neural networks for processing JPEG compressed data.
We exploit learning strategies to reduce the computational complexity by taking full advantage of DCT inputs.
Results show that learning how to combine all DCT inputs in a data-driven fashion is better than discarding them by hand.
arXiv Detail & Related papers (2021-04-01T01:21:24Z) - CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade.
CNNs are capable of learning robust representations of the data directly from the RGB pixels.
Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.