Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment
- URL: http://arxiv.org/abs/2512.10450v1
- Date: Thu, 11 Dec 2025 09:14:51 GMT
- Title: Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment
- Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Xinlong Pan, Haipeng Wang, Junni Zou, Hongkai Xiong,
- Abstract summary: We propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE)<n>QCMoE allows continuous and consistent rate control with appealing R-D performance.<n> Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts.
- Score: 92.57576987521107
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing frameworks for learned video compression suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separate-transform framework employs distinct transforms for intra-frame and inter-frame compression to yield impressive rate-distortion (R-D) performance but causes evident error propagation, while the unified-transform framework eliminates error propagation via shared transforms but is inferior in ME/MC in shared latent domains. To address this limitation, in this paper, we propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE) to enable quality-consistent and error-propagation-free streaming for learned video compression. Specifically, we propose dualdomain progressive temporal alignment for ME/MC that leverages coarse pixel-domain alignment and refined latent-domain alignment to significantly enhance temporal context modeling in a coarse-to-fine fashion. The coarse pixel-domain alignment efficiently handles simple motion patterns with optical flow estimated from a single reference frame, while the refined latent-domain alignment develops a Flow-Guided Deformable Transformer (FGDT) over latents from multiple reference frames to achieve long-term motion refinement (LTMR) for complex motion patterns. Furthermore, we design a QCMoE module for continuous bit-rate adaptation that dynamically assigns different experts to adjust quantization steps per pixel based on target quality and content rather than relies on a single quantization step. QCMoE allows continuous and consistent rate control with appealing R-D performance. Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts, while successfully eliminating error propagation.
Related papers
- EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation [8.795438456031512]
Multi-modal generation models have achieved high visual quality, but their prohibitive latency and limited temporal stability hinder real-time deployment.<n> Streaming inference exacerbates these issues, leading to pronounced multimodal ambiguities, such as blurring, temporal drift, and lip dechronization.<n>We propose EchoTorrent, a novel novel with a fourfold schema: Multi-Teacher Training fine-tunes a pre-trained model on distinct preference domains; Adaptive DMD (ACCDMD) calibrates the audio CFG degradation errors in phased via a schedule; Long Hybrid Tail, which enforces alignment exclusively on tail frames during long-horizon self-roll
arXiv Detail & Related papers (2026-02-14T08:32:38Z) - D$^2$-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy [7.553742541566094]
integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality.<n>We propose textbfD$2$-VR, a single-image diffusion-based video-restoration framework with low-step inference.
arXiv Detail & Related papers (2026-02-09T08:52:51Z) - Content Adaptive based Motion Alignment Framework for Learned Video Compression [72.13599533975413]
This paper proposes a content adaptive based motion alignment framework.<n>We first introduce a two-stage flow-guided deformable warping mechanism that refines motion compensation with coarse-to-fine offset prediction and mask modulation.<n>Second, we propose a multi-reference quality aware strategy that adjusts distortion weights based on reference quality, and applies it to hierarchical training to reduce error propagation.<n>Third, we integrate a training-free module that downsamples frames by motion magnitude and resolution to obtain smooth motion estimation.
arXiv Detail & Related papers (2025-12-15T02:51:47Z) - Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection [51.56484100374058]
We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks.<n>A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences.<n>Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection.
arXiv Detail & Related papers (2025-11-11T08:40:28Z) - Bidirectional Feature-aligned Motion Transformation for Efficient Dynamic Point Cloud Compression [97.66080040613726]
We propose a Bidirectional Feature-aligned Motion Transformation (Bi-FMT) framework that implicitly models motion in the feature space.<n>Bi-FMT aligns features across both past and future frames to produce temporally consistent latent representations.<n>We show Bi-FMT surpasses D-DPCC and AdaDPCC in both compression efficiency and runtime.
arXiv Detail & Related papers (2025-09-18T03:51:06Z) - Multipath Interference Suppression in Indirect Time-of-Flight Imaging via a Novel Compressed Sensing Framework [0.2710246456535607]
We propose a novel compressed sensing method to improve the depth reconstruction accuracy and multi-target separation capability of indirect Time-of-Flight (iToF) systems.<n>Our method operates with a single modulation frequency and constructs the sensing matrix using multiple phase shifts and narrow-duty-cycle continuous waves.<n> Experimental results demonstrate that the proposed method outperforms traditional approaches in both reconstruction accuracy and robustness, without requiring any additional hardware changes.
arXiv Detail & Related papers (2025-07-24T03:54:27Z) - Enhancing Low-Cost Video Editing with Lightweight Adaptors and Temporal-Aware Inversion [28.38307253613529]
We propose a framework that integrates temporal-spatial and semantic consistency with Baliteral DDIM inversion.<n>Our method significantly improves perceptual quality, text-image alignment, and temporal coherence, as demonstrated on the MSR-VTT dataset.
arXiv Detail & Related papers (2025-01-08T16:41:31Z) - Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames.
It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.