Towards Scalable Multi-View Reconstruction of Geometry and Materials
- URL: http://arxiv.org/abs/2306.03747v1
- Date: Tue, 6 Jun 2023 15:07:39 GMT
- Title: Towards Scalable Multi-View Reconstruction of Geometry and Materials
- Authors: Carolin Schmitt and Bo\v{z}idar Anti\'c and Andrei Neculai and Joo Ho
Lee and Andreas Geiger
- Abstract summary: We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
- Score: 27.660389147094715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel method for joint recovery of camera pose,
object geometry and spatially-varying Bidirectional Reflectance Distribution
Function (svBRDF) of 3D scenes that exceed object-scale and hence cannot be
captured with stationary light stages. The input are high-resolution RGB-D
images captured by a mobile, hand-held capture system with point lights for
active illumination. Compared to previous works that jointly estimate geometry
and materials from a hand-held scanner, we formulate this problem using a
single objective function that can be minimized using off-the-shelf
gradient-based solvers. To facilitate scalability to large numbers of
observation views and optimization variables, we introduce a distributed
optimization algorithm that reconstructs 2.5D keyframe-based representations of
the scene. A novel multi-view consistency regularizer effectively synchronizes
neighboring keyframes such that the local optimization results allow for
seamless integration into a globally consistent 3D model. We provide a study on
the importance of each component in our formulation and show that our method
compares favorably to baselines. We further demonstrate that our method
accurately reconstructs various objects and materials and allows for expansion
to spatially larger scenes. We believe that this work represents a significant
step towards making geometry and material estimation from hand-held scanners
scalable.
Related papers
- SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Multi-View Neural Surface Reconstruction with Structured Light [7.709526244898887]
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision.
We introduce active sensing with structured light (SL) into multi-view 3D object reconstruction based on DR to learn the unknown geometry and appearance of arbitrary scenes and camera poses.
Our method realizes high reconstruction accuracy in the textureless region and reduces efforts for camera pose calibration.
arXiv Detail & Related papers (2022-11-22T03:10:46Z) - Few-shot Non-line-of-sight Imaging with Signal-surface Collaborative
Regularization [18.466941045530408]
Non-line-of-sight imaging technique aims to reconstruct targets from multiply reflected light.
We propose a signal-surface collaborative regularization framework that provides noise-robust reconstructions with a minimal number of measurements.
Our approach has great potential in real-time non-line-of-sight imaging applications such as rescue operations and autonomous driving.
arXiv Detail & Related papers (2022-11-21T11:19:20Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation.
We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z) - NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction [4.989480853499916]
We present efficient and optimisable multi-class learned object descriptors together with a novel probabilistic and differential rendering engine.
Our framework allows for accurate and robust 3D object reconstruction which enables multiple applications including robot grasping and placing, augmented reality, and the first object-level SLAM system.
arXiv Detail & Related papers (2020-04-09T11:09:56Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.