Fused-Planes: Improving Planar Representations for Learning Large Sets of 3D Scenes
- URL: http://arxiv.org/abs/2410.23742v2
- Date: Fri, 31 Jan 2025 11:23:37 GMT
- Title: Fused-Planes: Improving Planar Representations for Learning Large Sets of 3D Scenes
- Authors: Karim Kassab, Antoine Schnepf, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet,
- Abstract summary: We introduce Fused-Planes, a new planar architecture that improves Tri-Planes resource-efficiency in the framework of learning large sets of scenes.
Our method divides it into two subsets and operates as follows: (i) we train the first subset of scenes jointly with a compression model, (ii) we use that compression model to learn the remaining scenes.
This compression model consists of a 3D-aware latent space in which Fused-Planes are learned, enabling a reduced rendering resolution, and shared structures across scenes that reduce scene representation complexity.
- Score: 8.847448988112903
- License:
- Abstract: To learn large sets of scenes, Tri-Planes are commonly employed for their planar structure that enables an interoperability with image models, and thus diverse 3D applications. However, this advantage comes at the cost of resource efficiency, as Tri-Planes are not the most computationally efficient option. In this paper, we introduce Fused-Planes, a new planar architecture that improves Tri-Planes resource-efficiency in the framework of learning large sets of scenes, which we call "multi-scene inverse graphics". To learn a large set of scenes, our method divides it into two subsets and operates as follows: (i) we train the first subset of scenes jointly with a compression model, (ii) we use that compression model to learn the remaining scenes. This compression model consists of a 3D-aware latent space in which Fused-Planes are learned, enabling a reduced rendering resolution, and shared structures across scenes that reduce scene representation complexity. Fused-Planes present competitive resource costs in multi-scene inverse graphics, while preserving Tri-Planes rendering quality, and maintaining their widely favored planar structure. Our codebase is publicly available as open-source. Our project page can be found at https://fused-planes.github.io .
Related papers
- PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes [32.00236197233923]
PlanarSplatting is an ultra-fast and accurate surface reconstruction approach for multiview indoor images.
PlanarSplatting reconstructs an indoor scene in 3 minutes while having significantly better geometric accuracy.
arXiv Detail & Related papers (2024-12-04T16:38:07Z) - MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - LT3SD: Latent Trees for 3D Scene Diffusion [71.91446143124648]
We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation.
We demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation.
arXiv Detail & Related papers (2024-09-12T16:55:51Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation [51.030773085422034]
BlockFusion is a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene.
A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements.
Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes.
arXiv Detail & Related papers (2024-01-30T14:34:19Z) - Convolutional Occupancy Models for Dense Packing of Complex, Novel
Objects [75.54599721349037]
We present a fully-convolutional shape completion model, F-CON, that can be easily combined with off-the-shelf planning methods for dense packing in the real world.
We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications.
Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes.
arXiv Detail & Related papers (2023-07-31T19:08:16Z) - K-Planes: Explicit Radiance Fields in Space, Time, and Appearance [32.78595254330191]
We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions.
Our model uses d choose 2 planes to represent a d-dimensional scene, providing a seamless way to go from static to dynamic scenes.
Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity.
arXiv Detail & Related papers (2023-01-24T18:59:08Z) - HexPlane: A Fast Representation for Dynamic Scenes [18.276921637560445]
We show that dynamic 3D scenes can be explicitly represented by six planes of learned features, leading to an elegant solution we call HexPlane.
A HexPlane computes features for points in spacetime by fusing vectors extracted from each plane, which is highly efficient.
arXiv Detail & Related papers (2023-01-23T18:59:25Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - 3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure
Prior [50.73148041205675]
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation.
We propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation.
Our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks.
arXiv Detail & Related papers (2020-03-31T09:33:46Z) - Planar Prior Assisted PatchMatch Multi-View Stereo [32.41293572426403]
completeness of 3D models is still a challenging problem in multi-view stereo.
Planar models are advantageous to the depth estimation of low-textured areas.
PatchMatch multi-view stereo is very efficient for its sampling and propagation scheme.
arXiv Detail & Related papers (2019-12-26T01:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.