Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics
- URL: http://arxiv.org/abs/2110.09241v1
- Date: Mon, 18 Oct 2021 12:42:13 GMT
- Title: Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics
- Authors: Wenhan Yang, Haofeng Huang, Yueyu Hu, Ling-Yu Duan, Jiaying Liu
- Abstract summary: Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
- Score: 101.35754364753409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Coding for Machines (VCM) is committed to bridging to an extent
separate research tracks of video/image compression and feature compression,
and attempts to optimize compactness and efficiency jointly from a unified
perspective of high accuracy machine vision and full fidelity human vision. In
this paper, we summarize VCM methodology and philosophy based on existing
academia and industrial efforts. The development of VCM follows a general
rate-distortion optimization, and the categorization of key modules or
techniques is established. From previous works, it is demonstrated that,
although existing works attempt to reveal the nature of scalable representation
in bits when dealing with machine and human vision tasks, there remains a rare
study in the generality of low bit rate representation, and accordingly how to
support a variety of visual analytic tasks. Therefore, we investigate a novel
visual information compression for the analytics taxonomy problem to strengthen
the capability of compact visual representations extracted from multiple tasks
for visual analytics. A new perspective of task relationships versus
compression is revisited. By keeping in mind the transferability among
different machine vision tasks (e.g. high-level semantic and mid-level
geometry-related), we aim to support multiple tasks jointly at low bit rates.
In particular, to narrow the dimensionality gap between neural network
generated features extracted from pixels and a variety of machine vision
features/labels (e.g. scene class, segmentation labels), a codebook hyperprior
is designed to compress the neural network-generated features. As demonstrated
in our experiments, this new hyperprior model is expected to improve feature
compression efficiency by estimating the signal entropy more accurately, which
enables further investigation of the granularity of abstracting compact
features among different tasks.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z) - Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.