Neural B-frame Video Compression with Bi-directional Reference Harmonization
- URL: http://arxiv.org/abs/2511.08938v1
- Date: Thu, 13 Nov 2025 01:20:06 GMT
- Title: Neural B-frame Video Compression with Bi-directional Reference Harmonization
- Authors: Yuxi Liu, Dengchao Jin, Shuai Huo, Jiawen Gu, Chao Zhou, Huihui Bai, Ming Lu, Zhan Ma,
- Abstract summary: We propose a novel neural B-frame video compression method, Bi-directional Reference Harmonization Video Compression (BRHVC)<n>BRHVC uses Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF) to optimize reference information utilization.<n> Experimental results indicate that our BRHVC outperforms previous state-of-the-art NVC methods.
- Score: 50.067848395760755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural video compression (NVC) has made significant progress in recent years, while neural B-frame video compression (NBVC) remains underexplored compared to P-frame compression. NBVC can adopt bi-directional reference frames for better compression performance. However, NBVC's hierarchical coding may complicate continuous temporal prediction, especially at some hierarchical levels with a large frame span, which could cause the contribution of the two reference frames to be unbalanced. To optimize reference information utilization, we propose a novel NBVC method, termed Bi-directional Reference Harmonization Video Compression (BRHVC), with the proposed Bi-directional Motion Converge (BMC) and Bi-directional Contextual Fusion (BCF). BMC converges multiple optical flows in motion compression, leading to more accurate motion compensation on a larger scale. Then BCF explicitly models the weights of reference contexts under the guidance of motion compensation accuracy. With more efficient motions and contexts, BRHVC can effectively harmonize bi-directional references. Experimental results indicate that our BRHVC outperforms previous state-of-the-art NVC methods, even surpassing the traditional coding, VTM-RA (under random access configuration), on the HEVC datasets. The source code is released at https://github.com/kwai/NVC.
Related papers
- Content Adaptive based Motion Alignment Framework for Learned Video Compression [72.13599533975413]
This paper proposes a content adaptive based motion alignment framework.<n>We first introduce a two-stage flow-guided deformable warping mechanism that refines motion compensation with coarse-to-fine offset prediction and mask modulation.<n>Second, we propose a multi-reference quality aware strategy that adjusts distortion weights based on reference quality, and applies it to hierarchical training to reduce error propagation.<n>Third, we integrate a training-free module that downsamples frames by motion magnitude and resolution to obtain smooth motion estimation.
arXiv Detail & Related papers (2025-12-15T02:51:47Z) - Fine-Grained Motion Compression and Selective Temporal Fusion for Neural B-Frame Video Coding [27.315485948158006]
We propose novel enhancements for motion compression and temporal fusion for neural B-frame coding.<n>Our proposed method incorporates an interactive dual-branch motion auto-encoder with per-branch adaptive quantization steps.<n>Second, we propose a selective temporal fusion method that predicts bi-directional fusion weights to achieve discriminative utilization of bi-directional multi-scale temporal contexts.
arXiv Detail & Related papers (2025-06-09T12:51:10Z) - Neural Video Compression with Context Modulation [9.875413481663742]
In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps.<n>We achieve on average 22.7% reduction over the advanced traditional video H.266/VVC, and offer an average 10.1% saving over the previous state-of-the-art NVC DCVC-FM.
arXiv Detail & Related papers (2025-05-20T15:57:09Z) - BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression [12.60355288519781]
We propose BiECVC, a learned bidirectional video compression (BVC) framework that incorporates diversified local and non-local context modeling.<n>BiECVC achieves state-of-the-art performance, reducing the bit-rate by 13.4% and 15.7% compared to VTM 13.2 under the Random Access (RA) configuration.<n>To our knowledge BiECVC is the first learned video to surpass VTM 13.2 across all standard test datasets.
arXiv Detail & Related papers (2025-05-14T06:55:37Z) - Improved Video VAE for Latent Video Diffusion Model [55.818110540710215]
Video Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora.
Most of existing VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression.
We propose a new KTC architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE)
arXiv Detail & Related papers (2024-11-10T12:43:38Z) - Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.<n>First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.<n>Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.<n>Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network.
PLVC learns to compress video towards good perceptual quality at low bit-rate.
The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.