Rate-Distortion in Image Coding for Machines
- URL: http://arxiv.org/abs/2209.11694v1
- Date: Wed, 21 Sep 2022 20:24:14 GMT
- Title: Rate-Distortion in Image Coding for Machines
- Authors: Alon Harell, Anderson De Andrade, and Ivan V. Bajic
- Abstract summary: In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans.
Traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics.
One way to create the machine side of such a scalable model is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task.
- Score: 26.32381277880991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been a sharp increase in transmission of images to
remote servers specifically for the purpose of computer vision. In many
applications, such as surveillance, images are mostly transmitted for automated
analysis, and rarely seen by humans. Using traditional compression for this
scenario has been shown to be inefficient in terms of bit-rate, likely due to
the focus on human based distortion metrics. Thus, it is important to create
specific image coding methods for joint use by humans and machines. One way to
create the machine side of such a codec is to perform feature matching of some
intermediate layer in a Deep Neural Network performing the machine task. In
this work, we explore the effects of the layer choice used in training a
learnable codec for humans and machines. We prove, using the data processing
inequality, that matching features from deeper layers is preferable in the
sense of rate-distortion. Next, we confirm our findings empirically by
re-training an existing model for scalable human-machine coding. In our
experiments we show the trade-off between the human and machine sides of such a
scalable model, and discuss the benefit of using deeper layers for training in
that regard.
Related papers
- Exploring Compressed Image Representation as a Perceptual Proxy: A Study [1.0878040851638]
We propose an end-to-end learned image compression wherein the analysis transform is jointly trained with an object classification task.
This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric.
arXiv Detail & Related papers (2024-01-14T04:37:17Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - Synthetic Data for Object Classification in Industrial Applications [53.180678723280145]
In object classification, capturing a large number of images per object and in different conditions is not always possible.
This work explores the creation of artificial images using a game engine to cope with limited data in the training dataset.
arXiv Detail & Related papers (2022-12-09T11:43:04Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks.
Our framework is built upon the traditional non-differential codecs.
Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Image coding for machines: an end-to-end learned approach [23.92748892163087]
In this paper, we propose an image for machines which is neural network (NN) based and end-to-end learned.
Our results show that our NN-based task outperforms the state-of-the-art Versa-tile Video Coding (VVC) standard on the object detection and instance segmentation tasks.
To the best of our knowledge, this is the first end-to-end learned machine-targeted image distortion.
arXiv Detail & Related papers (2021-08-23T07:54:42Z) - Deep Multilabel CNN for Forensic Footwear Impression Descriptor
Identification [0.9786690381850356]
We employ a deep learning approach to classify footwear impression's features known as emphdescriptors for forensic use cases.
We develop and evaluate a technique for feeding downsampled greyscale impressions to a neural network pre-trained on data from a different domain.
arXiv Detail & Related papers (2021-02-09T19:39:28Z) - Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.