Versatile Learned Video Compression
- URL: http://arxiv.org/abs/2111.03386v1
- Date: Fri, 5 Nov 2021 10:50:37 GMT
- Title: Versatile Learned Video Compression
- Authors: Runsen Feng, Zongyu Guo, Zhizheng Zhang, Zhibo Chen
- Abstract summary: We propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes.
Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields.
We show that the flow prediction module can largely reduce the transmission cost of voxel flows.
- Score: 26.976302025254043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned video compression methods have demonstrated great promise in catching
up with traditional video codecs in their rate-distortion (R-D) performance.
However, existing learned video compression schemes are limited by the binding
of the prediction mode and the fixed network framework. They are unable to
support various inter prediction modes and thus inapplicable for various
scenarios. In this paper, to break this limitation, we propose a versatile
learned video compression (VLVC) framework that uses one model to support all
possible prediction modes. Specifically, to realize versatile compression, we
first build a motion compensation module that applies multiple 3D motion vector
fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal
space. The voxel flows convey the information of temporal reference position
that helps to decouple inter prediction modes away from framework designing.
Secondly, in case of multiple-reference-frame prediction, we apply a flow
prediction module to predict accurate motion trajectories with a unified
polynomial function. We show that the flow prediction module can largely reduce
the transmission cost of voxel flows. Experimental results demonstrate that our
proposed VLVC not only supports versatile compression in various settings but
also achieves comparable R-D performance with the latest VVC standard in terms
of MS-SSIM.
Related papers
- IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - MMVC: Learned Multi-Mode Video Compression with Block-based Prediction
Mode Selection and Density-Adaptive Entropy Coding [21.147001610347832]
We propose a multi-mode video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns.
For entropy coding, we consider both dense and sparse post-quantization residual blocks, and apply optional run-length coding to sparse residuals to improve the compression rate.
Compared with state-of-the-art video compression schemes and standard codecs, our method yields better or competitive results measured with PSNR and MS-SSIM.
arXiv Detail & Related papers (2023-04-05T07:37:48Z) - Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences.
Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy.
Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z) - H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions [63.23985601478339]
We propose a simple yet effective solution, H-VFI, to deal with large motions in video frame.
H-VFI contributes a hierarchical video transformer to learn a deformable kernel in a coarse-to-fine strategy.
The advantage of such a progressive approximation is that the large motion frame problem can be predicted into several relatively simpler sub-tasks.
arXiv Detail & Related papers (2022-11-21T09:49:23Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - End-to-End Rate-Distortion Optimized Learned Hierarchical Bi-Directional
Video Compression [10.885590093103344]
Learned VC allows end-to-end rate-distortion (R-D) optimized training of nonlinear transform, motion and entropy model simultaneously.
This paper proposes a learned hierarchical bi-directional video (LHBDC) that combines the benefits of hierarchical motion-sampling and end-to-end optimization.
arXiv Detail & Related papers (2021-12-17T14:30:22Z) - FVC: A New Framework towards Deep Video Compression in Feature Space [21.410266039564803]
We propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.
The proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.
arXiv Detail & Related papers (2021-05-20T08:55:32Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z) - M-LVC: Multiple Frames Prediction for Learned Video Compression [111.50760486258993]
We propose an end-to-end learned video compression scheme for low-latency scenarios.
In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one.
Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode.
arXiv Detail & Related papers (2020-04-21T20:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.