Scaled Inverse Graphics: Efficiently Learning Large Sets of 3D Scenes
- URL: http://arxiv.org/abs/2410.23742v1
- Date: Thu, 31 Oct 2024 08:58:00 GMT
- Title: Scaled Inverse Graphics: Efficiently Learning Large Sets of 3D Scenes
- Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet,
- Abstract summary: We introduce a framework termed "scaled inverse graphics", aimed at efficiently learning large sets of scene representations.
It operates in two stages: (i) training a compression model on a subset of scenes, then (ii) training NeRF models on the resulting smaller representations.
In practice, we compact the representation of scenes by learning NeRFs in a latent space to reduce the image resolution, and sharing information across scenes to reduce NeRF representation complexity.
- Score: 8.847448988112903
- License:
- Abstract: While the field of inverse graphics has been witnessing continuous growth, techniques devised thus far predominantly focus on learning individual scene representations. In contrast, learning large sets of scenes has been a considerable bottleneck in NeRF developments, as repeatedly applying inverse graphics on a sequence of scenes, though essential for various applications, remains largely prohibitive in terms of resource costs. We introduce a framework termed "scaled inverse graphics", aimed at efficiently learning large sets of scene representations, and propose a novel method to this end. It operates in two stages: (i) training a compression model on a subset of scenes, then (ii) training NeRF models on the resulting smaller representations, thereby reducing the optimization space per new scene. In practice, we compact the representation of scenes by learning NeRFs in a latent space to reduce the image resolution, and sharing information across scenes to reduce NeRF representation complexity. We experimentally show that our method presents both the lowest training time and memory footprint in scaled inverse graphics compared to other methods applied independently on each scene. Our codebase is publicly available as open-source. Our project page can be found at https://scaled-ig.github.io .
Related papers
- SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields [9.606992888590757]
We build on Neural Radiance Fields (NeRF), which uses multi-layer perceptron to model the density and radiance field of a scene as the implicit function.
We propose an uncertain surface knowledge distillation strategy to transfer the radiance field knowledge of previous scenes to the new model.
Experiments show that the proposed approach achieves state-of-the-art rendering quality of continual learning NeRF on NeRF-Synthetic, LLFF, and TanksAndTemples datasets.
arXiv Detail & Related papers (2024-09-06T03:36:12Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting.
Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z) - BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial
Imagery [3.4956406636452626]
We introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery.
We present a novel bird-view pose-based spatial decomposition algorithm that decomposes a large aerial image set into multiple small sets with appropriately sized overlaps.
We evaluate our approach on existing datasets as well as against our own drone footage, improving reconstruction speed by 10x over classical photogrammetry software and 50x over state-of-the-art large-scale NeRF solution.
arXiv Detail & Related papers (2024-02-07T03:18:34Z) - Adaptive Voronoi NeRFs [9.973103531980838]
Neural Radiance Fields learn to represent a 3D scene from just a set of registered images.
We show that a hierarchy of Voronoi diagrams is a suitable choice to partition the scene.
By equipping each Voronoi cell with its own NeRF, our approach is able to quickly learn a scene representation.
arXiv Detail & Related papers (2023-03-28T14:16:08Z) - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware
Training [100.33713282611448]
We conduct the first pilot study on training NeRF with high-resolution data.
We propose the corresponding solutions, including marrying the multilayer perceptron with convolutional layers.
Our approach is nearly free without introducing obvious training/testing costs.
arXiv Detail & Related papers (2022-11-17T17:22:28Z) - Compressible-composable NeRF via Rank-residual Decomposition [21.92736190195887]
Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering.
We present a neural representation that enables efficient and convenient manipulation of models.
Our method is able to achieve comparable rendering quality to state-of-the-art methods, while enabling extra capability of compression and composition.
arXiv Detail & Related papers (2022-05-30T06:18:59Z) - Frame-wise Action Representations for Long Videos via Sequence
Contrastive Learning [44.412145665354736]
We introduce a novel contrastive action representation learning framework to learn frame-wise action representations.
Inspired by the recent progress of self-supervised learning, we present a novel sequence contrastive loss (SCL) applied on two correlated views.
Our approach also shows outstanding performance on video alignment and fine-grained frame retrieval tasks.
arXiv Detail & Related papers (2022-03-28T17:59:54Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.