MonoScene: Monocular 3D Semantic Scene Completion
- URL: http://arxiv.org/abs/2112.00726v1
- Date: Wed, 1 Dec 2021 18:59:57 GMT
- Title: MonoScene: Monocular 3D Semantic Scene Completion
- Authors: Anh-Quan Cao, Raoul de Charette
- Abstract summary: Mono proposes a 3D Semantic Scene Completion (SSC) framework, where the dense geometry and semantics of a scene are inferred from a single monocular image.
Our framework relies on successive 2D and 3D UNets bridged by a novel 2-3D features projection inspiring from optics.
- Score: 9.92186106077902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: MonoScene proposes a 3D Semantic Scene Completion (SSC) framework, where the
dense geometry and semantics of a scene are inferred from a single monocular
RGB image. Different from the SSC literature, relying on 2.5 or 3D input, we
solve the complex problem of 2D to 3D scene reconstruction while jointly
inferring its semantics. Our framework relies on successive 2D and 3D UNets
bridged by a novel 2D-3D features projection inspiring from optics and
introduces a 3D context relation prior to enforce spatio-semantic consistency.
Along with architectural contributions, we introduce novel global scene and
local frustums losses. Experiments show we outperform the literature on all
metrics and datasets while hallucinating plausible scenery even beyond the
camera field of view. Our code and trained models are available at
https://github.com/cv-rits/MonoScene
Related papers
- Fake It To Make It: Virtual Multiviews to Enhance Monocular Indoor Semantic Scene Completion [0.8669877024051931]
Monocular Indoor Semantic Scene Completion aims to reconstruct a 3D semantic occupancy map from a single RGB image of an indoor scene.
We introduce an innovative approach that leverages novel view synthesis and multiview fusion.
We demonstrate IoU score improvements of up to 2.8% for Scene Completion and 4.9% for Semantic Scene Completion when integrated with existing SSC networks on the NYUv2 dataset.
arXiv Detail & Related papers (2025-03-07T02:09:38Z) - PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM [105.01907579424362]
PanoSLAM is the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework.
For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.
arXiv Detail & Related papers (2024-12-31T08:58:10Z) - MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors [11.118490283303407]
We propose a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D.
Our method produces accurate semantics and geometry in both 3D and 2D space.
arXiv Detail & Related papers (2024-09-21T05:12:13Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D
Scene Generation [96.58789785954409]
We propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view map.
We produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.
arXiv Detail & Related papers (2023-12-04T18:56:10Z) - BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D
Scene Reconstruction From A Single Image [33.126045619754365]
BUOL is a framework with Occupancy-aware Lifting to address the two issues for panoptic 3D scene reconstruction from a single image.
Our method shows a tremendous performance advantage over state-of-the-art methods on synthetic dataset 3D-Front and real-world dataset Matterport3D.
arXiv Detail & Related papers (2023-06-01T17:56:49Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections [49.802462165826554]
We present SceneDreamer, an unconditional generative model for unbounded 3D scenes.
Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations.
arXiv Detail & Related papers (2023-02-02T18:59:16Z) - Learning 3D Scene Priors with 2D Supervision [37.79852635415233]
We propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth.
Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories.
Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction.
arXiv Detail & Related papers (2022-11-25T15:03:32Z) - CompNVS: Novel View Synthesis with Scene Completion [83.19663671794596]
We propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts.
We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area.
Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering.
arXiv Detail & Related papers (2022-07-23T09:03:13Z) - NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of
3D Scenes [25.26518805603798]
NeSF is a method for producing 3D semantic fields from posed RGB images alone.
Our method is the first to offer truly dense 3D scene segmentations requiring only 2D supervision for training.
arXiv Detail & Related papers (2021-11-25T21:44:54Z) - Curiosity-driven 3D Scene Structure from Single-image Self-supervision [22.527696847086574]
Previous work has demonstrated learning isolated 3D objects from 2D-only self-supervision.
Here we set out to extend this to entire 3D scenes made out of multiple objects, including their location, orientation and type.
The resulting system converts 2D images of different virtual or real images into complete 3D scenes, learned only from 2D images of those scenes.
arXiv Detail & Related papers (2020-12-02T14:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.