Related papers: IBVC: Interpolation-driven B-frame Video Compression

IBVC: Interpolation-driven B-frame Video Compression

URL: http://arxiv.org/abs/2309.13835v2
Date: Thu, 14 Mar 2024 12:01:11 GMT
Title: IBVC: Interpolation-driven B-frame Video Compression
Authors: Chenming Xu, Meiqin Liu, Chao Yao, Weisi Lin, Yao Zhao,
Abstract summary: B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation. We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
Score: 68.18440522300536
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learned B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. However, previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation or video frame interpolation. They suffer from inaccurate quantized motions and inefficient motion compensation. To address these issues, we propose a simple yet effective structure called Interpolation-driven B-frame Video Compression (IBVC). Our approach only involves two major operations: video frame interpolation and artifact reduction compression. IBVC introduces a bit-rate free MEMC based on interpolation, which avoids optical-flow quantization and additional compression distortions. Later, to reduce duplicate bit-rate consumption and focus on unaligned artifacts, a residual guided masking encoder is deployed to adaptively select the meaningful contexts with interpolated multi-scale dependencies. In addition, a conditional spatio-temporal decoder is proposed to eliminate location errors and artifacts instead of using MEMC coding in other methods. The experimental results on B-frame coding demonstrate that IBVC has significant improvements compared to the relevant state-of-the-art methods. Meanwhile, our approach can save bit rates compared with the random access (RA) configuration of H.266 (VTM). The code will be available at https://github.com/ruhig6/IBVC.

Related papers

Fine-Grained Motion Compression and Selective Temporal Fusion for Neural B-Frame Video Coding [27.315485948158006]
We propose novel enhancements for motion compression and temporal fusion for neural B-frame coding.<n>Our proposed method incorporates an interactive dual-branch motion auto-encoder with per-branch adaptive quantization steps.<n>Second, we propose a selective temporal fusion method that predicts bi-directional fusion weights to achieve discriminative utilization of bi-directional multi-scale temporal contexts.
arXiv Detail & Related papers (2025-06-09T12:51:10Z)
Improved Video VAE for Latent Video Diffusion Model [55.818110540710215]
Video Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora. Most of existing VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression. We propose a new KTC architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE)
arXiv Detail & Related papers (2024-11-10T12:43:38Z)
Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B. First, we develop a bi-directional motion difference context propagation method for effective motion difference coding. Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model. Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z)
Perception-Oriented Video Frame Interpolation via Asymmetric Blending [20.0024308216849]
Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. We propose PerVFI (Perception-oriented Video Frame Interpolation) to mitigate these challenges. Experimental results validate the superiority of PerVFI, demonstrating significant improvements in perceptual quality compared to existing methods.
arXiv Detail & Related papers (2024-04-10T02:40:17Z)
Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z)
Video Frame Interpolation with Densely Queried Bilateral Correlation [52.823751291070906]
Video Frame Interpolation (VFI) aims to synthesize non-existent intermediate frames between existent frames. Flow-based VFI algorithms estimate intermediate motion fields to warp the existent frames. We propose Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem.
arXiv Detail & Related papers (2023-04-26T14:45:09Z)
Learned Video Compression via Heterogeneous Deformable Compensation Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance. More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression [50.48504867843605]
We propose a search-free block motion estimation framework using a multi-stage convolutional neural network. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
arXiv Detail & Related papers (2021-10-05T03:38:43Z)
FVC: A New Framework towards Deep Video Compression in Feature Space [21.410266039564803]
We propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space. The proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.
arXiv Detail & Related papers (2021-05-20T08:55:32Z)
All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time. This work introduces a true multi-frame interpolator. It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
End-to-End Learning for Video Frame Compression with Self-Attention [25.23586503813838]
We propose an end-to-end learned system for compressing video frames. Our system learns deep embeddings of frames and encodes their difference in latent space. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality.
arXiv Detail & Related papers (2020-04-20T12:11:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.