Related papers: CNNs for JPEGs: A Study in Computational Cost

CNNs for JPEGs: A Study in Computational Cost

URL: http://arxiv.org/abs/2309.11417v2
Date: Fri, 22 Sep 2023 19:40:11 GMT
Title: CNNs for JPEGs: A Study in Computational Cost
Authors: Samuel Felipe dos Santos, Nicu Sebe, and Jurandy Almeida
Abstract summary: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade. CNNs are capable of learning robust representations of the data directly from the RGB pixels. Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
Score: 45.74830585715129
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade, defining state-of-the-art in several computer vision tasks. CNNs are capable of learning robust representations of the data directly from the RGB pixels. However, most image data are usually available in compressed format, from which the JPEG is the most widely used due to transmission and storage purposes demanding a preliminary decoding process that have a high computational load and memory usage. For this reason, deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years. Those methods usually extract a frequency domain representation of the image, like DCT, by a partial decoding, and then make adaptation to typical CNNs architectures to work with them. One limitation of these current works is that, in order to accommodate the frequency domain data, the modifications made to the original model increase significantly their amount of parameters and computational complexity. On one hand, the methods have faster preprocessing, since the cost of fully decoding the images is avoided, but on the other hand, the cost of passing the images though the model is increased, mitigating the possible upside of accelerating the method. In this paper, we propose a further study of the computational cost of deep models designed for the frequency domain, evaluating the cost of decoding and passing the images through the network. We also propose handcrafted and data-driven techniques for reducing the computational complexity and the number of parameters for these models in order to keep them similar to their RGB baselines, leading to efficient models with a better trade off between computational cost and accuracy.

Related papers

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain [8.084341432899954]
homomorphic encryption (FHE) and machine learning offer unprecedented opportunities for private inference of sensitive data. FHE enables computation directly on encrypted data, safeguarding the entire machine learning pipeline, including data and model confidentiality. Existing FHE-based implementations for deep neural networks face challenges in computational cost, latency, and scalability. This paper introduces DCT-CryptoNets, a novel approach that leverages frequency-domain learning to tackle these issues.
arXiv Detail & Related papers (2024-08-27T17:48:29Z)
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion [35.88039888482076]
We introduce the first Differentiable Augmentation Search method (DAS) to generate variations of images that can be processed as videos. DAS is extremely fast and flexible, allowing the search on very large search spaces in less than a GPU day. We leverage DAS to guide the reshaping of the spatial receptive field by selecting task-dependant transformations.
arXiv Detail & Related papers (2024-03-22T13:27:57Z)
Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation [7.539498729072623]
Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks.
arXiv Detail & Related papers (2023-06-29T05:49:07Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z)
Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data. Transformers have shown significant performance gains on natural language and high-level vision tasks. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z)
Improved FRQI on superconducting processors and its restrictions in the NISQ era [62.997667081978825]
We study the feasibility of the Flexible Representation of Quantum Images (FRQI) We also check experimentally what is the limit in the current noisy intermediate-scale quantum era. We propose a method for simplifying the circuits needed for the FRQI.
arXiv Detail & Related papers (2021-10-29T10:42:43Z)
Less is More: Accelerating Faster Neural Networks Straight from JPEG [1.9214041945441434]
We show how to speed up convolutional neural networks for processing JPEG compressed data. We exploit learning strategies to reduce the computational complexity by taking full advantage of DCT inputs. Results show that learning how to combine all DCT inputs in a data-driven fashion is better than discarding them by hand.
arXiv Detail & Related papers (2021-04-01T01:21:24Z)
CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade. CNNs are capable of learning robust representations of the data directly from the RGB pixels. Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.