Scaled Inverse Graphics: Efficiently Learning Large Sets of 3D Scenes
- URL: http://arxiv.org/abs/2410.23742v1
- Date: Thu, 31 Oct 2024 08:58:00 GMT
- Title: Scaled Inverse Graphics: Efficiently Learning Large Sets of 3D Scenes
- Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet,
- Abstract summary: We introduce a framework termed "scaled inverse graphics", aimed at efficiently learning large sets of scene representations.
It operates in two stages: (i) training a compression model on a subset of scenes, then (ii) training NeRF models on the resulting smaller representations.
In practice, we compact the representation of scenes by learning NeRFs in a latent space to reduce the image resolution, and sharing information across scenes to reduce NeRF representation complexity.
- Score: 8.847448988112903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the field of inverse graphics has been witnessing continuous growth, techniques devised thus far predominantly focus on learning individual scene representations. In contrast, learning large sets of scenes has been a considerable bottleneck in NeRF developments, as repeatedly applying inverse graphics on a sequence of scenes, though essential for various applications, remains largely prohibitive in terms of resource costs. We introduce a framework termed "scaled inverse graphics", aimed at efficiently learning large sets of scene representations, and propose a novel method to this end. It operates in two stages: (i) training a compression model on a subset of scenes, then (ii) training NeRF models on the resulting smaller representations, thereby reducing the optimization space per new scene. In practice, we compact the representation of scenes by learning NeRFs in a latent space to reduce the image resolution, and sharing information across scenes to reduce NeRF representation complexity. We experimentally show that our method presents both the lowest training time and memory footprint in scaled inverse graphics compared to other methods applied independently on each scene. Our codebase is publicly available as open-source. Our project page can be found at https://scaled-ig.github.io .
Related papers
- PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes [32.00236197233923]
PlanarSplatting is an ultra-fast and accurate surface reconstruction approach for multiview indoor images.
PlanarSplatting reconstructs an indoor scene in 3 minutes while having significantly better geometric accuracy.
arXiv Detail & Related papers (2024-12-04T16:38:07Z) - MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - LT3SD: Latent Trees for 3D Scene Diffusion [71.91446143124648]
We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation.
We demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation.
arXiv Detail & Related papers (2024-09-12T16:55:51Z) - SCARF: Scalable Continual Learning Framework for Memory-efficient Multiple Neural Radiance Fields [9.606992888590757]
We build on Neural Radiance Fields (NeRF), which uses multi-layer perceptron to model the density and radiance field of a scene as the implicit function.
We propose an uncertain surface knowledge distillation strategy to transfer the radiance field knowledge of previous scenes to the new model.
Experiments show that the proposed approach achieves state-of-the-art rendering quality of continual learning NeRF on NeRF-Synthetic, LLFF, and TanksAndTemples datasets.
arXiv Detail & Related papers (2024-09-06T03:36:12Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction [59.40711222096875]
We present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting.
Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets.
arXiv Detail & Related papers (2024-02-27T11:40:50Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial
Imagery [3.4956406636452626]
We introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery.
We present a novel bird-view pose-based spatial decomposition algorithm that decomposes a large aerial image set into multiple small sets with appropriately sized overlaps.
We evaluate our approach on existing datasets as well as against our own drone footage, improving reconstruction speed by 10x over classical photogrammetry software and 50x over state-of-the-art large-scale NeRF solution.
arXiv Detail & Related papers (2024-02-07T03:18:34Z) - BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation [51.030773085422034]
BlockFusion is a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene.
A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements.
Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes.
arXiv Detail & Related papers (2024-01-30T14:34:19Z) - Convolutional Occupancy Models for Dense Packing of Complex, Novel
Objects [75.54599721349037]
We present a fully-convolutional shape completion model, F-CON, that can be easily combined with off-the-shelf planning methods for dense packing in the real world.
We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications.
Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes.
arXiv Detail & Related papers (2023-07-31T19:08:16Z) - Adaptive Voronoi NeRFs [9.973103531980838]
Neural Radiance Fields learn to represent a 3D scene from just a set of registered images.
We show that a hierarchy of Voronoi diagrams is a suitable choice to partition the scene.
By equipping each Voronoi cell with its own NeRF, our approach is able to quickly learn a scene representation.
arXiv Detail & Related papers (2023-03-28T14:16:08Z) - K-Planes: Explicit Radiance Fields in Space, Time, and Appearance [32.78595254330191]
We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions.
Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static to dynamic scenes.
Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity.
arXiv Detail & Related papers (2023-01-24T18:59:08Z) - HexPlane: A Fast Representation for Dynamic Scenes [18.276921637560445]
We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane.
A HexPlane computes features for points in spacetime by fusing vectors extracted from each plane, which is highly efficient.
arXiv Detail & Related papers (2023-01-23T18:59:25Z) - AligNeRF: High-Fidelity Neural Radiance Fields via Alignment-Aware
Training [100.33713282611448]
We conduct the first pilot study on training NeRF with high-resolution data.
We propose the corresponding solutions, including marrying the multilayer perceptron with convolutional layers.
Our approach is nearly free without introducing obvious training/testing costs.
arXiv Detail & Related papers (2022-11-17T17:22:28Z) - Compressible-composable NeRF via Rank-residual Decomposition [21.92736190195887]
Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering.
We present a neural representation that enables efficient and convenient manipulation of models.
Our method is able to achieve comparable rendering quality to state-of-the-art methods, while enabling extra capability of compression and composition.
arXiv Detail & Related papers (2022-05-30T06:18:59Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Frame-wise Action Representations for Long Videos via Sequence
Contrastive Learning [44.412145665354736]
We introduce a novel contrastive action representation learning framework to learn frame-wise action representations.
Inspired by the recent progress of self-supervised learning, we present a novel sequence contrastive loss (SCL) applied on two correlated views.
Our approach also shows outstanding performance on video alignment and fine-grained frame retrieval tasks.
arXiv Detail & Related papers (2022-03-28T17:59:54Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - IBRNet: Learning Multi-View Image-Based Rendering [67.15887251196894]
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.
By drawing on source views at render time, our method hearkens back to classic work on image-based rendering.
arXiv Detail & Related papers (2021-02-25T18:56:21Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - Planar Prior Assisted PatchMatch Multi-View Stereo [32.41293572426403]
completeness of 3D models is still a challenging problem in multi-view stereo.
Planar models are advantageous to the depth estimation of low-textured areas.
PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme.
arXiv Detail & Related papers (2019-12-26T01:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.