Related papers: VVC+M: Plug and Play Scalable Image Coding for Humans and Machines

VVC+M: Plug and Play Scalable Image Coding for Humans and Machines

URL: http://arxiv.org/abs/2305.10453v1
Date: Wed, 17 May 2023 00:22:39 GMT
Title: VVC+M: Plug and Play Scalable Image Coding for Humans and Machines
Authors: Alon Harell, Yalda Foroutan, and Ivan V. Bajic
Abstract summary: In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. We propose to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable from any image compression for machines (ICM) scheme.
Score: 25.062104976775448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compression for machines is an emerging field, where inputs are encoded while optimizing the performance of downstream automated analysis. In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction. Often performed by jointly optimizing the compression scheme for both machine task and human perception, this results in sub-optimal rate-distortion (RD) performance for the machine side. We focus on the case of images, proposing to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable codec from any image compression for machines (ICM) scheme. Using our approach we improve an existing scalable codec to achieve superior RD performance on the machine task, while remaining competitive for human perception. Moreover, our approach can be trained post-hoc for any given ICM scheme, and without creating a coupling between the quality of the machine analysis and human vision.

Related papers

Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis. We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models. Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z)
NN-VVC: Versatile Video Coding boosted by self-supervisedly learned image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z)
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z)
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling [66.56355316611598]
Satisfied Machine Ratio (SMR) is a metric that evaluates the perceptual quality of compressed images and videos for machines. SMR enables perceptual coding for machines and propels Video Coding for Machines from specificity to generality.
arXiv Detail & Related papers (2022-11-13T03:16:36Z)
Preprocessing Enhanced Image Compression for Machine Vision [14.895698385236937]
We propose a preprocessing enhanced image compression method for machine vision tasks. Our framework is built upon the traditional non-differential codecs. Experimental results show our method achieves a better tradeoff between the coding and the performance of the downstream machine vision tasks by saving about 20%.
arXiv Detail & Related papers (2022-06-12T03:36:38Z)
Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z)
A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work. The high-level instance segmentation map and the low-level signal features are extracted with neural networks. An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z)
Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z)
An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding. We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern. By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.