Perceptual Video Coding for Machines via Satisfied Machine Ratio
Modeling
- URL: http://arxiv.org/abs/2211.06797v3
- Date: Tue, 9 Jan 2024 13:02:50 GMT
- Title: Perceptual Video Coding for Machines via Satisfied Machine Ratio
Modeling
- Authors: Qi Zhang, Shanshe Wang, Xinfeng Zhang, Chuanmin Jia, Zhao Wang, Siwei
Ma, Wen Gao
- Abstract summary: Satisfied Machine Ratio (SMR) is a metric that evaluates the perceptual quality of compressed images and videos for machines.
SMR enables perceptual coding for machines and propels Video Coding for Machines from specificity to generality.
- Score: 66.56355316611598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video Coding for Machines (VCM) aims to compress visual signals for machine
analysis. However, existing methods only consider a few machines, neglecting
the majority. Moreover, the machine's perceptual characteristics are not
leveraged effectively, resulting in suboptimal compression efficiency. To
overcome these limitations, this paper introduces Satisfied Machine Ratio
(SMR), a metric that statistically evaluates the perceptual quality of
compressed images and videos for machines by aggregating satisfaction scores
from them. Each score is derived from machine perceptual differences between
original and compressed images. Targeting image classification and object
detection tasks, we build two representative machine libraries for SMR
annotation and create a large-scale SMR dataset to facilitate SMR studies. We
then propose an SMR prediction model based on the correlation between deep
feature differences and SMR. Furthermore, we introduce an auxiliary task to
increase the prediction accuracy by predicting the SMR difference between two
images in different quality. Extensive experiments demonstrate that SMR models
significantly improve compression performance for machines and exhibit robust
generalizability on unseen machines, codecs, datasets, and frame types. SMR
enables perceptual coding for machines and propels VCM from specificity to
generality. Code is available at https://github.com/ywwynm/SMR.
Related papers
- Machine vision-aware quality metrics for compressed image and video assessment [0.0]
Modern video-analysis efforts involve so much data that they necessitate machine-vision processing with minimal human intervention.
This paper explores the effects of compression on detection and recognition algorithms.
It introduces novel full-reference image/video-quality metrics for each task, tailored to machine vision.
arXiv Detail & Related papers (2024-11-11T08:07:34Z) - A Rate-Distortion-Classification Approach for Lossy Image Compression [0.0]
In lossy image compression, the objective is to achieve minimal signal distortion while compressing images to a specified bit rate.
To bridge the gap between image compression and visual analysis, we propose a Rate-Distortion-Classification (RDC) model for lossy image compression.
arXiv Detail & Related papers (2024-05-06T14:11:36Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - VVC+M: Plug and Play Scalable Image Coding for Humans and Machines [25.062104976775448]
In scalable coding for humans and machines, the compressed representation used for machines is further utilized to enable input reconstruction.
We propose to utilize the pre-existing residual coding capabilities of video codecs such as VVC to create a scalable from any image compression for machines (ICM) scheme.
arXiv Detail & Related papers (2023-05-17T00:22:39Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Deep Optimized Multiple Description Image Coding via Scalar Quantization
Learning [37.00592782976494]
We introduce a deep multiple description coding (MDC) framework optimized by minimizing multiple description (MD) compressive loss.
An auto-encoder network composed of these two types of network is designed as a symmetrical parameter sharing structure.
Our framework performs better than several state-of-the-art MDC approaches regarding image coding efficiency when tested on several commonly available datasets.
arXiv Detail & Related papers (2020-01-12T05:03:16Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.