Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors
- URL: http://arxiv.org/abs/2010.04030v5
- Date: Tue, 3 May 2022 08:54:37 GMT
- Title: Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors
- Authors: Cathrin Elich, Martin R. Oswald, Marc Pollefeys, Joerg Stueckler
- Abstract summary: PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
- Score: 69.02332607843569
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Representing scenes at the granularity of objects is a prerequisite for scene
understanding and decision making. We propose PriSMONet, a novel approach based
on Prior Shape knowledge for learning Multi-Object 3D scene decomposition and
representations from single images. Our approach learns to decompose images of
synthetic scenes with multiple objects on a planar surface into its constituent
scene objects and to infer their 3D properties from a single view. A recurrent
encoder regresses a latent representation of 3D shape, pose and texture of each
object from an input RGB image. By differentiable rendering, we train our model
to decompose scenes from RGB-D images in a self-supervised way. The 3D shapes
are represented continuously in function-space as signed distance functions
which we pre-train from example shapes in a supervised way. These shape priors
provide weak supervision signals to better condition the challenging overall
learning task. We evaluate the accuracy of our model in inferring 3D scene
layout, demonstrate its generative capabilities, assess its generalization to
real images, and point out benefits of the learned representation.
Related papers
- SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture [47.44029968307207]
We propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images.
Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering.
A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model.
arXiv Detail & Related papers (2023-11-01T11:46:15Z) - Differentiable Blocks World: Qualitative 3D Decomposition by Rendering
Primitives [70.32817882783608]
We present an approach that produces a simple, compact, and actionable 3D world representation by means of 3D primitives.
Unlike existing primitive decomposition methods that rely on 3D input data, our approach operates directly on images.
We show that the resulting textured primitives faithfully reconstruct the input images and accurately model the visible 3D points.
arXiv Detail & Related papers (2023-07-11T17:58:31Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - 3DP3: 3D Scene Perception via Probabilistic Programming [28.491817202574932]
3DP3 is a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images.
Our results demonstrate that 3DP3 is more accurate at 6DoF object pose estimation from real images than deep learning baselines.
arXiv Detail & Related papers (2021-10-30T19:10:34Z) - Sparse Pose Trajectory Completion [87.31270669154452]
We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views.
This is achieved with a cross-modal pose trajectory transfer mechanism.
Our method is evaluated on the Pix3D and ShapeNet datasets.
arXiv Detail & Related papers (2021-05-01T00:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.