HPC: Hierarchical Progressive Coding Framework for Volumetric Video
- URL: http://arxiv.org/abs/2407.09026v2
- Date: Sat, 3 Aug 2024 02:22:34 GMT
- Title: HPC: Hierarchical Progressive Coding Framework for Volumetric Video
- Authors: Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang,
- Abstract summary: Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications.
Current NeRF compression lacks the flexibility to adjust video quality and within a single model for various network and device capacities.
We propose HPC, a novel hierarchical progressive video coding framework achieving variable using a single model.
- Score: 39.403294185116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.
Related papers
- Point Cloud Geometry Scalable Coding Using a Resolution and Quality-conditioned Latents Probability Estimator [47.792286013837945]
This paper focuses on the development of scalable coding solutions for deep learning-based Point Cloud (PC) coding.
The peculiarities of this 3D representation make it hard to implement flexible solutions that do not compromise the other functionalities of the software.
arXiv Detail & Related papers (2025-02-19T20:58:53Z) - GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression [13.616981296093932]
We propose a novel, model-agnostic technique that organizes Gaussians into several hierarchical layers.
This method, combined with recent approach of compression of 3DGS, allows a single model to instantly scale across several compression ratios.
We validate our approach on typical datasets and benchmarks, showcasing low distortion and substantial gains in terms of scalability and adaptability.
arXiv Detail & Related papers (2025-01-23T11:05:45Z) - Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers [55.87192133758051]
Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency.
We propose DiffRatio-MoD, a dynamic DiT inference framework with differentiable compression ratios.
arXiv Detail & Related papers (2024-12-22T02:04:17Z) - VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.
The substantial data volumes pose significant challenges for storage and transmission.
We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Multi-Density Attention Network for Loop Filtering in Video Compression [9.322800480045336]
We propose a on-line scaling based multi-density attention network for loop filtering in video compression.
Experimental results show that 10.18% bit-rate reduction at the same video quality can be achieved over the latest Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-04-08T05:46:38Z) - Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression [20.504561050200365]
We propose the first learned multi-frequency image compression and entropy coding approach.
It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components.
We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
arXiv Detail & Related papers (2020-02-24T01:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.