DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images
- URL: http://arxiv.org/abs/2208.07227v1
- Date: Mon, 15 Aug 2022 14:32:10 GMT
- Title: DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images
- Authors: Bing Wang, Lu Chen, Bo Yang
- Abstract summary: DM-NeRF is among the first to simultaneously reconstruct, decompose, manipulate and render complex 3D scenes in a single pipeline.
Our method can accurately decompose all 3D objects from 2D views, allowing any interested object to be freely manipulated in 3D space.
- Score: 15.712721653893636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of 3D scene geometry decomposition and
manipulation from 2D views. By leveraging the recent implicit neural
representation techniques, particularly the appealing neural radiance fields,
we introduce an object field component to learn unique codes for all individual
objects in 3D space only from 2D supervision. The key to this component is a
series of carefully designed loss functions to enable every 3D point,
especially in non-occupied space, to be effectively optimized even without 3D
labels. In addition, we introduce an inverse query algorithm to freely
manipulate any specified 3D object shape in the learned scene representation.
Notably, our manipulation algorithm can explicitly tackle key issues such as
object collisions and visual occlusions. Our method, called DM-NeRF, is among
the first to simultaneously reconstruct, decompose, manipulate and render
complex 3D scenes in a single pipeline. Extensive experiments on three datasets
clearly show that our method can accurately decompose all 3D objects from 2D
views, allowing any interested object to be freely manipulated in 3D space such
as translation, rotation, size adjustment, and deformation.
Related papers
- ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images [47.682942867405224]
ConDense is a framework for 3D pre-training utilizing existing 2D networks and large-scale multi-view datasets.
We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline.
arXiv Detail & Related papers (2024-08-30T05:57:01Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - Neural 3D Scene Reconstruction from Multiple 2D Images without 3D
Supervision [41.20504333318276]
We propose a novel neural reconstruction method that reconstructs scenes using sparse depth under the plane constraints without 3D supervision.
We introduce a signed distance function field, a color field, and a probability field to represent a scene.
We optimize these fields to reconstruct the scene by using differentiable ray marching with accessible 2D images as supervision.
arXiv Detail & Related papers (2023-06-30T13:30:48Z) - Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion [115.82306502822412]
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing.
A corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing.
We study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures.
arXiv Detail & Related papers (2022-12-14T18:49:50Z) - ONeRF: Unsupervised 3D Object Segmentation from Multiple Views [59.445957699136564]
ONeRF is a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
The segmented 3D objects are represented using separate Neural Radiance Fields (NeRFs) which allow for various 3D scene editing and novel view rendering.
arXiv Detail & Related papers (2022-11-22T06:19:37Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Accelerate 3D Object Processing via Spectral Layout [1.52292571922932]
We propose to embed the essential information in a 3D object into 2D space via spectral layout.
The proposed method can achieve high quality 2D representations for 3D objects, which enables to use 2D-based methods to process 3D objects.
arXiv Detail & Related papers (2021-10-25T03:18:37Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - Learning geometry-image representation for 3D point cloud generation [5.3485743892868545]
We propose a novel geometry image based generator (GIG) to convert the 3D point cloud generation problem to a 2D geometry image generation problem.
Experiments on both rigid and non-rigid 3D object datasets have demonstrated the promising performance of our method.
arXiv Detail & Related papers (2020-11-29T05:21:10Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.