MMVC: Learned Multi-Mode Video Compression with Block-based Prediction
Mode Selection and Density-Adaptive Entropy Coding
- URL: http://arxiv.org/abs/2304.02273v1
- Date: Wed, 5 Apr 2023 07:37:48 GMT
- Title: MMVC: Learned Multi-Mode Video Compression with Block-based Prediction
Mode Selection and Density-Adaptive Entropy Coding
- Authors: Bowen Liu, Yu Chen, Rakesh Chowdary Machineni, Shiyu Liu, Hun-Seok Kim
- Abstract summary: We propose a multi-mode video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns.
For entropy coding, we consider both dense and sparse post-quantization residual blocks, and apply optional run-length coding to sparse residuals to improve the compression rate.
Compared with state-of-the-art video compression schemes and standard codecs, our method yields better or competitive results measured with PSNR and MS-SSIM.
- Score: 21.147001610347832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based video compression has been extensively studied over the past
years, but it still has limitations in adapting to various motion patterns and
entropy models. In this paper, we propose multi-mode video compression (MMVC),
a block wise mode ensemble deep video compression framework that selects the
optimal mode for feature domain prediction adapting to different motion
patterns. Proposed multi-modes include ConvLSTM-based feature domain
prediction, optical flow conditioned feature domain prediction, and feature
propagation to address a wide range of cases from static scenes without
apparent motions to dynamic scenes with a moving camera. We partition the
feature space into blocks for temporal prediction in spatial block-based
representations. For entropy coding, we consider both dense and sparse
post-quantization residual blocks, and apply optional run-length coding to
sparse residuals to improve the compression rate. In this sense, our method
uses a dual-mode entropy coding scheme guided by a binary density map, which
offers significant rate reduction surpassing the extra cost of transmitting the
binary selection map. We validate our scheme with some of the most popular
benchmarking datasets. Compared with state-of-the-art video compression schemes
and standard codecs, our method yields better or competitive results measured
with PSNR and MS-SSIM.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Scene Matters: Model-based Deep Video Compression [13.329074811293292]
We propose a model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences.
Our proposed MVC directly models novel intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy.
Our method achieves up to a 20% reduction compared to the latest video standard H.266 and is more efficient in decoding than existing video coding strategies.
arXiv Detail & Related papers (2023-03-08T13:15:19Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - Efficient VVC Intra Prediction Based on Deep Feature Fusion and
Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation.
Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z) - Versatile Learned Video Compression [26.976302025254043]
We propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes.
Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields.
We show that the flow prediction module can largely reduce the transmission cost of voxel flows.
arXiv Detail & Related papers (2021-11-05T10:50:37Z) - Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression [50.48504867843605]
We propose a search-free block motion estimation framework using a multi-stage convolutional neural network.
We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
arXiv Detail & Related papers (2021-10-05T03:38:43Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z) - M-LVC: Multiple Frames Prediction for Learned Video Compression [111.50760486258993]
We propose an end-to-end learned video compression scheme for low-latency scenarios.
In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one.
Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode.
arXiv Detail & Related papers (2020-04-21T20:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.