Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics
- URL: http://arxiv.org/abs/2001.03569v2
- Date: Mon, 13 Jan 2020 16:03:58 GMT
- Title: Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics
- Authors: Ling-Yu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, Wen Gao
- Abstract summary: Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
- Score: 127.65410486227007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video coding, which targets to compress and reconstruct the whole frame, and
feature compression, which only preserves and transmits the most critical
information, stand at two ends of the scale. That is, one is with compactness
and efficiency to serve for machine vision, and the other is with full
fidelity, bowing to human perception. The recent endeavors in imminent trends
of video compression, e.g. deep learning based coding tools and end-to-end
image/video coding, and MPEG-7 compact feature descriptor standards, i.e.
Compact Descriptors for Visual Search and Compact Descriptors for Video
Analysis, promote the sustainable and fast development in their own directions,
respectively. In this paper, thanks to booming AI technology, e.g. prediction
and generation models, we carry out exploration in the new area, Video Coding
for Machines (VCM), arising from the emerging MPEG standardization efforts1.
Towards collaborative compression and intelligent analytics, VCM attempts to
bridge the gap between feature coding for machine vision and video coding for
human vision. Aligning with the rising Analyze then Compress instance Digital
Retina, the definition, formulation, and paradigm of VCM are given first.
Meanwhile, we systematically review state-of-the-art techniques in video
compression and feature compression from the unique perspective of MPEG
standardization, which provides the academic and industrial evidence to realize
the collaborative compression of video and feature streams in a broad range of
AI applications. Finally, we come up with potential VCM solutions, and the
preliminary results have demonstrated the performance and efficiency gains.
Further direction is discussed as well.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - NN-VVC: Versatile Video Coding boosted by self-supervisedly learned
image coding for machines [19.183883119933558]
This paper proposes a hybrid for machines called NN-VVC, which combines the advantages of an E2E-learned image and a CVC to achieve high performance in both image and video coding for machines.
Our experiments show that the proposed system achieved up to -43.20% and -26.8% Bjontegaard Delta rate reduction over VVC for image and video data, respectively.
arXiv Detail & Related papers (2024-01-19T15:33:46Z) - End-to-End Learnable Multi-Scale Feature Compression for VCM [8.037759667748768]
We propose a novel multi-scale feature compression method that enables the end-to-end optimization on the extracted features and the design of lightweight encoders.
Our model outperforms previous approaches by at least 52% BD-rate reduction and has $times5$ to $times27$ times less encoding time for object detection.
arXiv Detail & Related papers (2023-06-29T04:05:13Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.