VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression
- URL: http://arxiv.org/abs/2412.11362v1
- Date: Mon, 16 Dec 2024 01:28:04 GMT
- Title: VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression
- Authors: Qiang Hu, Houqiang Zhong, Zihan Zheng, Xiaoyun Zhang, Zhengxue Cheng, Li Song, Guangtao Zhai, Yanfeng Wang,
- Abstract summary: NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.<n>The substantial data volumes pose significant challenges for storage and transmission.<n>We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
- Score: 59.14355576912495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Radiance Field (NeRF)-based volumetric video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences that provide audiences with unprecedented immersion and interactivity. However, the substantial data volumes pose significant challenges for storage and transmission. Existing solutions typically optimize NeRF representation and compression independently or focus on a single fixed rate-distortion (RD) tradeoff. In this paper, we propose VRVVC, a novel end-to-end joint optimization variable-rate framework for volumetric video compression that achieves variable bitrates using a single model while maintaining superior RD performance. Specifically, VRVVC introduces a compact tri-plane implicit residual representation for inter-frame modeling of long-duration dynamic scenes, effectively reducing temporal redundancy. We further propose a variable-rate residual representation compression scheme that leverages a learnable quantization and a tiny MLP-based entropy model. This approach enables variable bitrates through the utilization of predefined Lagrange multipliers to manage the quantization error of all latent representations. Finally, we present an end-to-end progressive training strategy combined with a multi-rate-distortion loss function to optimize the entire framework. Extensive experiments demonstrate that VRVVC achieves a wide range of variable bitrates within a single model and surpasses the RD performance of existing methods across various datasets.
Related papers
- QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z) - MSNeRV: Neural Video Representation with Multi-Scale Feature Fusion [27.621656985302973]
Implicit Neural representations (INRs) have emerged as a promising approach for video compression.<n>Existing INR-based methods struggle to effectively represent detail-intensive and fast-changing video content.<n>We propose a multi-scale feature fusion framework, MSNeRV, for neural video representation.
arXiv Detail & Related papers (2025-06-18T08:57:12Z) - FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - Multi-View Learning with Context-Guided Receptance for Image Denoising [18.175992709188026]
Image denoising is essential in low-level vision applications such as photography and automated driving.<n>Existing methods struggle with distinguishing complex noise patterns in real-world scenes and consume significant computational resources.<n>In this work, a Context-guided Receptance Weighted Key-Value (M) model is proposed, combining enhanced multi-view feature integration with efficient sequence modeling.<n>The model is validated on multiple real-world image denoising datasets, outperforming the existing state-of-the-art methods quantitatively and reducing inference time up to 40%.
arXiv Detail & Related papers (2025-05-05T14:57:43Z) - DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos.
These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z) - Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations.
Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations.
Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z) - 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video [56.04182926886754]
3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences.
Existing methods typically handle dynamic 3DGS representation and compression separately, motion information and the rate-distortion trade-off during training.
We propose 4DGC, a rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV.
arXiv Detail & Related papers (2025-03-24T08:05:27Z) - UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression [29.174318150967405]
Implicit Neural Representations (INRs) have demonstrated significant potential in video compression by representing videos as neural networks.
We present a novel understanding of INR models from an autoregressive (AR) perspective and introduce a Unified AutoRegressive Framework for memory-efficient Neural Video Compression (UAR-NVC)
UAR-NVC integrates timeline-based and INR-based neural video compression under a unified autoregressive paradigm.
arXiv Detail & Related papers (2025-03-04T15:54:57Z) - HPC: Hierarchical Progressive Coding Framework for Volumetric Video [39.403294185116]
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications.
Current NeRF compression lacks the flexibility to adjust video quality and within a single model for various network and device capacities.
We propose HPC, a novel hierarchical progressive video coding framework achieving variable using a single model.
arXiv Detail & Related papers (2024-07-12T06:34:24Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video
Restoration [78.14941737723501]
We propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR.
By orchestrating two cascading procedures, CDUN achieves adaptive processing for diverse degradations.
In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames.
arXiv Detail & Related papers (2023-09-04T14:18:00Z) - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos [69.22032459870242]
We present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time free-view rendering on long-duration dynamic scenes.
We show such a strategy can handle large motions without sacrificing quality.
Based on ReRF, we design a special FVV that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes.
arXiv Detail & Related papers (2023-04-10T08:36:00Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.