WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields
- URL: http://arxiv.org/abs/2312.02218v3
- Date: Wed, 8 May 2024 13:24:32 GMT
- Title: WavePlanes: A compact Wavelet representation for Dynamic Neural Radiance Fields
- Authors: Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull,
- Abstract summary: This paper presents WavePlanes, a fast and more compact explicit model.
We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients.
Exploiting the sparsity of wavelet coefficients, we compress the model using a Hash Map.
- Score: 9.158626732325915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic Neural Radiance Fields (Dynamic NeRF) enhance NeRF technology to model moving scenes. However, they are resource intensive and challenging to compress. To address these issues, this paper presents WavePlanes, a fast and more compact explicit model. We propose a multi-scale space and space-time feature plane representation using N-level 2-D wavelet coefficients. The inverse discrete wavelet transform reconstructs feature signals at varying detail, which are linearly decoded to approximate the color and density of volumes in a 4-D grid. Exploiting the sparsity of wavelet coefficients, we compress the model using a Hash Map containing only non-zero coefficients and their locations on each plane. Compared to the state-of-the-art (SotA) plane-based models, WavePlanes is up to 15x smaller while being less resource demanding and competitive in performance and training time. Compared to other small SotA models WavePlanes preserves details better without requiring custom CUDA code or high performance computing resources. Our code is available at: https://github.com/azzarelli/waveplanes/
Related papers
- UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion [51.31220416754788]
We present UDiFF, a 3D diffusion model for unsigned distance fields (UDFs) which is capable to generate textured 3D shapes with open surfaces from text conditions or unconditionally.
Our key idea is to generate UDFs in spatial-frequency domain with an optimal wavelet transformation, which produces a compact representation space for UDF generation.
arXiv Detail & Related papers (2024-04-10T09:24:54Z) - Make-A-Shape: a Ten-Million-scale 3D Shape Model [52.701745578415796]
This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale.
We first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme.
We derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients.
arXiv Detail & Related papers (2024-01-20T00:21:58Z) - Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models [83.35835521670955]
Surf-D is a novel method for generating high-quality 3D shapes as Surfaces with arbitrary topologies.
We use the Unsigned Distance Field (UDF) as our surface representation to accommodate arbitrary topologies.
We also propose a new pipeline that employs a point-based AutoEncoder to learn a compact and continuous latent space for accurately encoding UDF.
arXiv Detail & Related papers (2023-11-28T18:56:01Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - Efficient Large-scale Scene Representation with a Hybrid of
High-resolution Grid and Plane Features [44.25307397334988]
Existing neural radiance fields (NeRF) methods for large-scale scene modeling require days of training using multiple GPUs.
We introduce a new and efficient hybrid feature representation for NeRF that fuses the 3D hash-grids and high-resolution 2D dense plane features.
Based on this hybrid representation, we propose a fast optimization NeRF variant, called GP-NeRF, that achieves better rendering results while maintaining a compact model size.
arXiv Detail & Related papers (2023-03-06T10:04:50Z) - K-Planes: Explicit Radiance Fields in Space, Time, and Appearance [32.78595254330191]
We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions.
Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static to dynamic scenes.
Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity.
arXiv Detail & Related papers (2023-01-24T18:59:08Z) - Wave-ViT: Unifying Wavelet and Transformers for Visual Representation
Learning [138.29273453811945]
Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks.
We propose a new Wavelet Vision Transformer (textbfWave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning.
arXiv Detail & Related papers (2022-07-11T16:03:51Z) - WaveMix: A Resource-efficient Neural Network for Image Analysis [3.4927288761640565]
WaveMix is resource-efficient and yet generalizable and scalable.
Networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks.
WaveMix establishes new benchmarks for segmentation on Cityscapes.
arXiv Detail & Related papers (2022-05-28T09:08:50Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - High-Fidelity and Low-Latency Universal Neural Vocoder based on
Multiband WaveRNN with Data-Driven Linear Prediction for Discrete Waveform
Modeling [38.828260316517536]
This paper presents a novel universal neural vocoder framework based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling (MWDLP)
Experiments demonstrate that the proposed MWDLP framework generates high-fidelity synthetic speech for seen and unseen speakers and/or language on 300 speakers training data including clean and noisy/reverberant conditions.
arXiv Detail & Related papers (2021-05-20T16:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.