Image coding for machines: an end-to-end learned approach
- URL: http://arxiv.org/abs/2108.09993v1
- Date: Mon, 23 Aug 2021 07:54:42 GMT
- Title: Image coding for machines: an end-to-end learned approach
- Authors: Nam Le, Honglei Zhang, Francesco Cricri, Ramin Ghaznavi-Youvalari, Esa
Rahtu
- Abstract summary: In this paper, we propose an image for machines which is neural network (NN) based and end-to-end learned.
Our results show that our NN-based task outperforms the state-of-the-art Versa-tile Video Coding (VVC) standard on the object detection and instance segmentation tasks.
To the best of our knowledge, this is the first end-to-end learned machine-targeted image distortion.
- Score: 23.92748892163087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over recent years, deep learning-based computer vision systems have been
applied to images at an ever-increasing pace, oftentimes representing the only
type of consumption for those images. Given the dramatic explosion in the
number of images generated per day, a question arises: how much better would an
image codec targeting machine-consumption perform against state-of-the-art
codecs targeting human-consumption? In this paper, we propose an image codec
for machines which is neural network (NN) based and end-to-end learned. In
particular, we propose a set of training strategies that address the delicate
problem of balancing competing loss functions, such as computer vision task
losses, image distortion losses, and rate loss. Our experimental results show
that our NN-based codec outperforms the state-of-the-art Versa-tile Video
Coding (VVC) standard on the object detection and instance segmentation tasks,
achieving -37.87% and -32.90% of BD-rate gain, respectively, while being fast
thanks to its compact size. To the best of our knowledge, this is the first
end-to-end learned machine-targeted image codec.
Related papers
- Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs [47.7670923159071]
We present a new image compression paradigm to achieve intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs)
We dub our method textitSDComp'' for textitSemantically textitDisentangled textitCompression'', and compare it with state-of-the-art codecs on a wide variety of different vision tasks.
arXiv Detail & Related papers (2024-08-16T07:23:18Z) - Rate-Distortion-Cognition Controllable Versatile Neural Image Compression [47.72668401825835]
We propose a rate-distortion-cognition controllable versatile image compression method.
Our method yields satisfactory ICM performance and flexible Rate-DistortionCognition controlling.
arXiv Detail & Related papers (2024-07-16T13:17:51Z) - NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z) - Bridging the gap between image coding for machines and humans [20.017766644567036]
In many use cases, such as surveillance, it is important that the visual quality is not drastically deteriorated by the compression process.
Recent works on using neural network (NN) based ICM codecs have shown significant coding gains against traditional methods.
We propose an effective decoder finetuning scheme based on adversarial training to significantly enhance the visual quality of ICM.
arXiv Detail & Related papers (2024-01-19T14:49:56Z) - Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks.
Our framework is built upon the traditional non-differential codecs.
Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Learned Image Coding for Machines: A Content-Adaptive Approach [24.749491401730065]
Machine-to-machine communication represents a new challenge and opens up new perspectives in the context of data compression.
We present an inference-time content-adaptive finetuning scheme that optimize the latent representation of an end-to-end learned image.
Our system achieves -30.54% BD-rate over the state-of-the-art image/video Coding (VVC)
arXiv Detail & Related papers (2021-08-23T07:53:35Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - How to Exploit the Transferability of Learned Image Compression to
Conventional Codecs [25.622863999901874]
We show how learned image coding can be used as a surrogate to optimize an image for encoding.
Our approach can remodel a conventional image to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead.
arXiv Detail & Related papers (2020-12-03T12:34:51Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.