An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal
- URL: http://arxiv.org/abs/2001.03004v1
- Date: Thu, 9 Jan 2020 14:18:18 GMT
- Title: An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal
- Authors: Sifeng Xia, Kunchangtai Liang, Wenhan Yang, Ling-Yu Duan and Jiaying
Liu
- Abstract summary: Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
- Score: 99.49099501559652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study a new problem arising from the emerging MPEG
standardization effort Video Coding for Machine (VCM), which aims to bridge the
gap between visual feature compression and classical video coding. VCM is
committed to address the requirement of compact signal representation for both
machine and human vision in a more or less scalable way. To this end, we make
endeavors in leveraging the strength of predictive and generative models to
support advanced compression techniques for both machine and human vision tasks
simultaneously, in which visual features serve as a bridge to connect
signal-level and task-level compact representations in a scalable manner.
Specifically, we employ a conditional deep generation network to reconstruct
video frames with the guidance of learned motion pattern. By learning to
extract sparse motion pattern via a predictive model, the network elegantly
leverages the feature representation to generate the appearance of to-be-coded
frames via a generative model, relying on the appearance of the coded key
frames. Meanwhile, the sparse motion pattern is compact and highly effective
for high-level vision tasks, e.g. action recognition. Experimental results
demonstrate that our method yields much better reconstruction quality compared
with the traditional video codecs (0.0063 gain in SSIM), as well as
state-of-the-art action recognition performance over highly compressed videos
(9.4% gain in recognition accuracy), which showcases a promising paradigm of
coding signal for both human and machine vision.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Towards Modality Transferable Visual Information Representation with
Optimal Model Compression [67.89885998586995]
We propose a new scheme for visual signal representation that leverages the philosophy of transferable modality.
The proposed framework is implemented on the state-of-the-art video coding standard.
arXiv Detail & Related papers (2020-08-13T01:52:40Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.