Multi-Scale Deformable Alignment and Content-Adaptive Inference for
Flexible-Rate Bi-Directional Video Compression
- URL: http://arxiv.org/abs/2306.16544v1
- Date: Wed, 28 Jun 2023 20:32:16 GMT
- Title: Multi-Scale Deformable Alignment and Content-Adaptive Inference for
Flexible-Rate Bi-Directional Video Compression
- Authors: M.Ak{\i}n Y{\i}lmaz, O.Ugur Ulas, A.Murat Tekalp
- Abstract summary: This paper proposes an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression.
We employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points.
Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.
- Score: 8.80688035831646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The lack of ability to adapt the motion compensation model to video content
is an important limitation of current end-to-end learned video compression
models. This paper advances the state-of-the-art by proposing an adaptive
motion-compensation model for end-to-end rate-distortion optimized hierarchical
bi-directional video compression. In particular, we propose two novelties: i) a
multi-scale deformable alignment scheme at the feature level combined with
multi-scale conditional coding, ii) motion-content adaptive inference. In
addition, we employ a gain unit, which enables a single model to operate at
multiple rate-distortion operating points. We also exploit the gain unit to
control bit allocation among intra-coded vs. bi-directionally coded frames by
fine tuning corresponding models for truly flexible-rate learned video coding.
Experimental results demonstrate state-of-the-art rate-distortion performance
exceeding those of all prior art in learned video coding.
Related papers
- Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization [13.341123726068652]
We propose a novel Multi-granularity Temporal Trajectory Factorization framework for generative human video compression.
Experimental results show that proposed method outperforms latest generative models and the state-of-the-art video coding standard Versatile Video Coding.
arXiv Detail & Related papers (2024-10-14T05:34:32Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Boost Video Frame Interpolation via Motion Adaptation [73.42573856943923]
Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability.
We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
arXiv Detail & Related papers (2023-06-24T10:44:02Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Flexible-Rate Learned Hierarchical Bi-Directional Video Compression With
Motion Refinement and Frame-Level Bit Allocation [8.80688035831646]
We combine motion estimation and prediction modules and compress refined residual motion vectors for improved rate-distortion performance.
We exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames.
arXiv Detail & Related papers (2022-06-27T20:18:52Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - End-to-End Rate-Distortion Optimization for Bi-Directional Learned Video
Compression [10.404162481860634]
Learned video compression allows end-to-end rate-distortion optimized training of all nonlinear modules.
In this paper, we propose for the first time end-to-end optimization of a hierarchical, bi-directional motion compensated by accumulating cost function over fixed-size groups of pictures.
arXiv Detail & Related papers (2020-08-11T22:50:06Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.