End-to-End Learnable Multi-Scale Feature Compression for VCM
- URL: http://arxiv.org/abs/2306.16670v3
- Date: Tue, 8 Aug 2023 05:00:58 GMT
- Title: End-to-End Learnable Multi-Scale Feature Compression for VCM
- Authors: Yeongwoong Kim, Hyewon Jeong, Janghyun Yu, Younhee Kim, Jooyoung Lee,
Se Yoon Jeong, and Hui Yong Kim
- Abstract summary: We propose a novel multi-scale feature compression method that enables the end-to-end optimization on the extracted features and the design of lightweight encoders.
Our model outperforms previous approaches by at least 52% BD-rate reduction and has $times5$ to $times27$ times less encoding time for object detection.
- Score: 8.037759667748768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of deep learning-based machine vision applications has
given rise to a new type of compression, so called video coding for machine
(VCM). VCM differs from traditional video coding in that it is optimized for
machine vision performance instead of human visual quality. In the feature
compression track of MPEG-VCM, multi-scale features extracted from images are
subject to compression. Recent feature compression works have demonstrated that
the versatile video coding (VVC) standard-based approach can achieve a BD-rate
reduction of up to 96% against MPEG-VCM feature anchor. However, it is still
sub-optimal as VVC was not designed for extracted features but for natural
images. Moreover, the high encoding complexity of VVC makes it difficult to
design a lightweight encoder without sacrificing performance. To address these
challenges, we propose a novel multi-scale feature compression method that
enables both the end-to-end optimization on the extracted features and the
design of lightweight encoders. The proposed model combines a learnable
compressor with a multi-scale feature fusion network so that the redundancy in
the multi-scale features is effectively removed. Instead of simply cascading
the fusion network and the compression network, we integrate the fusion and
encoding processes in an interleaved way. Our model first encodes a
larger-scale feature to obtain a latent representation and then fuses the
latent with a smaller-scale feature. This process is successively performed
until the smallest-scale feature is fused and then the encoded latent at the
final stage is entropy-coded for transmission. The results show that our model
outperforms previous approaches by at least 52% BD-rate reduction and has
$\times5$ to $\times27$ times less encoding time for object detection...
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Accelerating Learned Video Compression via Low-Resolution Representation Learning [18.399027308582596]
We introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning.
Our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM.
arXiv Detail & Related papers (2024-07-23T12:02:57Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Device Interoperability for Learned Image Compression with Weights and
Activations Quantization [1.373801677008598]
We present a method to solve the device interoperability problem of a state-of-the-art image compression network.
We suggest a simple method which can ensure cross-platform encoding and decoding, and can be implemented quickly.
arXiv Detail & Related papers (2022-12-02T17:45:29Z) - Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image
Compression [85.93207826513192]
We propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression.
We solve the joint lossy and residual compression problem in the approach of VAEs.
In the near-lossless mode, we quantize the original residuals to satisfy a given $ell_infty$ error bound.
arXiv Detail & Related papers (2022-09-11T12:11:56Z) - Block Modulating Video Compression: An Ultra Low Complexity Image Compression Encoder for Resource Limited Platforms [35.76050232152349]
An ultra low-cost image Modulating Video Compression (BMVC) is proposed to be implemented on mobile platforms with low consumption of power and computation resources.
Two types of BMVC decoders, implemented by deep neural networks, are presented.
arXiv Detail & Related papers (2022-05-07T16:20:09Z) - Microdosing: Knowledge Distillation for GAN based Compression [18.140328230701233]
We show how to leverage knowledge distillation to obtain equally capable image decoders at a fraction of the original number of parameters.
This allows us to reduce the model size by a factor of 20 and to achieve 50% reduction in decoding time.
arXiv Detail & Related papers (2022-01-07T14:27:16Z) - Conditional Entropy Coding for Efficient Video Compression [82.35389813794372]
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.
We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs.
We then propose a novel internal learning extension on top of this architecture that brings an additional 10% savings without trading off decoding speed.
arXiv Detail & Related papers (2020-08-20T20:01:59Z) - A Unified End-to-End Framework for Efficient Deep Image Compression [35.156677716140635]
We propose a unified framework called Efficient Deep Image Compression (EDIC) based on three new technologies.
Specifically, we design an auto-encoder style network for learning based image compression.
Our EDIC method can also be readily incorporated with the Deep Video Compression (DVC) framework to further improve the video compression performance.
arXiv Detail & Related papers (2020-02-09T14:21:08Z) - Video Coding for Machines: A Paradigm of Collaborative Compression and
Intelligent Analytics [127.65410486227007]
Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale.
Recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, promote the sustainable and fast development in their own directions.
In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG
arXiv Detail & Related papers (2020-01-10T17:24:13Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.