SCONE: Surface Coverage Optimization in Unknown Environments by
Volumetric Integration
- URL: http://arxiv.org/abs/2208.10449v1
- Date: Mon, 22 Aug 2022 17:04:14 GMT
- Title: SCONE: Surface Coverage Optimization in Unknown Environments by
Volumetric Integration
- Authors: Antoine Gu\'edon, Pascal Monasse, Vincent Lepetit
- Abstract summary: Next Best View computation (NBV) is a long-standing problem in robotics.
We show that we can maximize surface metrics by Monte Carlo integration over a volumetric representation.
It takes as input an arbitrarily large point cloud gathered by a depth sensor like Lidar systems as well as camera poses to predict NBV.
- Score: 23.95135709027516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Next Best View computation (NBV) is a long-standing problem in robotics, and
consists in identifying the next most informative sensor position(s) for
reconstructing a 3D object or scene efficiently and accurately. Like most
current methods, we consider NBV prediction from a depth sensor. Learning-based
methods relying on a volumetric representation of the scene are suitable for
path planning, but do not scale well with the size of the scene and have lower
accuracy than methods using a surface-based representation. However, the latter
constrain the camera to a small number of poses. To obtain the advantages of
both representations, we show that we can maximize surface metrics by Monte
Carlo integration over a volumetric representation. Our method scales to large
scenes and handles free camera motion: It takes as input an arbitrarily large
point cloud gathered by a depth sensor like Lidar systems as well as camera
poses to predict NBV. We demonstrate our approach on a novel dataset made of
large and complex 3D scenes.
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose [44.13819148680788]
We develop a novel construct-and-optimize method for sparse view synthesis without camera poses.
Specifically, we construct a solution by using monocular depth and projecting pixels back into the 3D world.
We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views.
arXiv Detail & Related papers (2024-05-06T17:36:44Z) - Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes [17.055674468733244]
We propose an incremental joint learning framework, which can achieve accurate depth, pose estimation, and large-scale scene reconstruction.
A vision transformer-based network is adopted as the backbone to enhance performance in scale information estimation.
In terms of implicit scene representation, we propose an incremental scene representation method to construct the entire large-scale scene as multiple local radiance fields.
arXiv Detail & Related papers (2024-04-09T06:27:35Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior [133.76192155312182]
We propose a method that learns to selectively leverage information from coplanar pixels to improve the predicted depth.
An extensive evaluation of our method shows that we set the new state of the art in supervised monocular depth estimation.
arXiv Detail & Related papers (2022-04-05T10:03:52Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Back to the Feature: Learning Robust Camera Localization from Pixels to
Pose [114.89389528198738]
We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model.
The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching.
arXiv Detail & Related papers (2021-03-16T17:40:12Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.