Related papers: BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression

BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression

URL: http://arxiv.org/abs/2505.09193v4
Date: Thu, 24 Jul 2025 16:57:30 GMT
Title: BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression
Authors: Wei Jiang, Junru Li, Kai Zhang, Li Zhang,
Abstract summary: We propose BiECVC, a learned bidirectional video compression (BVC) framework that incorporates diversified local and non-local context modeling.<n>BiECVC achieves state-of-the-art performance, reducing the bit-rate by 13.4% and 15.7% compared to VTM 13.2 under the Random Access (RA) configuration.<n>To our knowledge BiECVC is the first learned video to surpass VTM 13.2 across all standard test datasets.
Score: 12.60355288519781
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent forward prediction-based learned video compression (LVC) methods have achieved impressive results, even surpassing VVC reference software VTM under the Low Delay B (LDB) configuration. In contrast, learned bidirectional video compression (BVC) remains underexplored and still lags behind its forward-only counterparts. This performance gap is mainly due to the limited ability to extract diverse and accurate contexts: most existing BVCs primarily exploit temporal motion while neglecting non-local correlations across frames. Moreover, they lack the adaptability to dynamically suppress harmful contexts arising from fast motion or occlusion. To tackle these challenges, we propose BiECVC, a BVC framework that incorporates diversified local and non-local context modeling along with adaptive context gating. For local context enhancement, BiECVC reuses high-quality features from lower layers and aligns them using decoded motion vectors without introducing extra motion overhead. To model non-local dependencies efficiently, we adopt a linear attention mechanism that balances performance and complexity. To further mitigate the impact of inaccurate context prediction, we introduce Bidirectional Context Gating, inspired by data-dependent decay in recent autoregressive language models, to dynamically filter contextual information based on conditional coding results. Extensive experiments demonstrate that BiECVC achieves state-of-the-art performance, reducing the bit-rate by 13.4% and 15.7% compared to VTM 13.2 under the Random Access (RA) configuration with intra periods of 32 and 64, respectively. To our knowledge, BiECVC is the first learned video codec to surpass VTM 13.2 RA across all standard test datasets.

Related papers

Neural Video Compression with Context Modulation [9.875413481663742]
In this paper, we address the limitation by modulating the temporal context with the reference frame in two steps.<n>We achieve on average 22.7% reduction over the advanced traditional video H.266/VVC, and offer an average 10.1% saving over the previous state-of-the-art NVC DCVC-FM.
arXiv Detail & Related papers (2025-05-20T15:57:09Z)
Augmented Deep Contexts for Spatially Embedded Video Coding [8.213635577747638]
Most Neural Video Codecs (NVCs) only employ temporal references to generate temporal-only contexts and latent prior.<n>We propose a Spatially Embedded Video Codec (SEVC) in which the low-resolution video is compressed for spatial references.<n>Our SEVC effectively alleviates the limitations in handling large motions or emerging objects, and also reduces 11.9% more than the previous state-of-the-art NVC.
arXiv Detail & Related papers (2025-05-08T14:57:52Z)
Improved Video VAE for Latent Video Diffusion Model [55.818110540710215]
Video Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora. Most of existing VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression. We propose a new KTC architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE)
arXiv Detail & Related papers (2024-11-10T12:43:38Z)
Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.<n>First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.<n>Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.<n>Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z)
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding [108.79026216923984]
Video grounding aims to localize a-temporal section in a video corresponding to an input text query. This paper addresses a critical limitation in current video grounding methodologies by introducing an Open-Vocabulary Spatio-Temporal Video Grounding task.
arXiv Detail & Related papers (2023-12-31T13:53:37Z)
IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction. Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation. We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z)
Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding [5.8550373172233305]
We propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. The model achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension.
arXiv Detail & Related papers (2023-08-30T06:49:34Z)
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection [58.47940430618352]
We propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD.
arXiv Detail & Related papers (2023-08-22T14:58:36Z)
You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query. Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames. We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z)
Perceptual Learned Video Compression with Recurrent Conditional GAN [158.0726042755]
We propose a Perceptual Learned Video Compression (PLVC) approach with recurrent conditional generative adversarial network. PLVC learns to compress video towards good perceptual quality at low bit-rate. The user study further validates the outstanding perceptual performance of PLVC in comparison with the latest learned video compression approaches.
arXiv Detail & Related papers (2021-09-07T13:36:57Z)
Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC) It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals. NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.